Assisted Decoding

Large language models are rapidly gaining popularity, but their slow response times often frustrate users, driving them toward less capable alternatives. In this post, we’ll explore the reasons behind these delays and will explore an innovative decoding technique—Assisted Generation— that can dramatically improve performance, cutting latency by up to 10 times on standard hardware! Understanding … Continue reading Assisted Decoding →

Unveiling LORA 🚀: Fine-tuning Neural Networks with Low-Rank Adaptation

Introduction 🌐 Hey, folks! Welcome to the exploration of a groundbreaking technique: LORA—Low Rank Adaptation of large language models. LORA emerged from the labs at Microsoft around two years ago, and today, we'll uncover its mysteries. We'll understand its significance, operational mechanics. So, let's dive into the realm of language models and explore the wonders … Continue reading Unveiling LORA 🚀: Fine-tuning Neural Networks with Low-Rank Adaptation →

Demystifying Neural Networks: A Deep Dive into Manual Backpropagation

Backpropagation, a fundamental concept in artificial neural networks and machine learning, operates as a supervised learning algorithm for training neural networks. The term itself stems from the process of propagating error information backward through the network. In essence, backpropagation seeks to minimize the gap between a neural network's predicted output and the actual target values. … Continue reading Demystifying Neural Networks: A Deep Dive into Manual Backpropagation →