Assisted Decoding

Large language models are rapidly gaining popularity, but their slow response times often frustrate users, driving them toward less capable alternatives. In this post, we’ll explore the reasons behind these delays and will explore an innovative decoding technique—Assisted Generation— that can dramatically improve performance, cutting latency by up to 10 times on standard hardware! Understanding … Continue reading Assisted Decoding →

Unveiling LORA 🚀: Fine-tuning Neural Networks with Low-Rank Adaptation

Introduction 🌐 Hey, folks! Welcome to the exploration of a groundbreaking technique: LORA—Low Rank Adaptation of large language models. LORA emerged from the labs at Microsoft around two years ago, and today, we'll uncover its mysteries. We'll understand its significance, operational mechanics. So, let's dive into the realm of language models and explore the wonders … Continue reading Unveiling LORA 🚀: Fine-tuning Neural Networks with Low-Rank Adaptation →