Core AI

LoRA (Low-Rank Adaptation)

Last reviewed: April 2026

A fine-tuning technique that trains only a small number of additional parameters instead of updating the entire model, making customisation faster and cheaper.

LoRA (Low-Rank Adaptation) is a technique for fine-tuning AI models that dramatically reduces the computational cost. Instead of updating all billions of parameters in a model, LoRA freezes the original weights and adds small, trainable matrices alongside them.

The problem LoRA solves

Fine-tuning a large language model traditionally means adjusting every one of its billions of parameters on your custom data. This requires enormous amounts of GPU memory, takes hours or days, and produces a complete copy of the model for each customisation. If you want ten different fine-tuned versions, you need storage for ten full model copies.

LoRA solves this by training only a tiny fraction of new parameters — typically less than 1% of the original model size.

How LoRA works

The core insight is mathematical. The weight updates during fine-tuning tend to have low rank — meaning they can be approximated by much smaller matrices without significant loss of quality.

LoRA decomposes the weight update into two small matrices that, when multiplied together, approximate the full update. Instead of updating a matrix with millions of values, you train two small matrices with thousands of values each. The original model weights stay frozen, and LoRA's small matrices are applied on top during inference.

Practical benefits

Faster training: Fine-tuning with LoRA takes minutes to hours instead of hours to days.
Lower cost: You need far less GPU memory, often making fine-tuning possible on consumer hardware.
Easy switching: LoRA adapters are small files (often under 100 MB) that can be swapped in and out of the base model. One base model can serve many use cases by loading different adapters.
No catastrophic forgetting: Because the base model is frozen, the original capabilities are preserved. The adapter adds new knowledge without overwriting old knowledge.

Common use cases

Adapting an open-source model to follow your company's writing style and terminology.
Teaching a model to handle domain-specific tasks like medical coding or legal analysis.
Creating personalised assistants that understand specific organisational context.

Variations

QLoRA further reduces costs by combining LoRA with quantisation — compressing the frozen base model to use less memory while training the LoRA adapter in full precision.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

LoRA makes model customisation accessible to organisations that cannot afford full fine-tuning. If you are considering adapting an AI model to your specific domain or brand voice, LoRA is likely the technique that makes it economically viable. Understanding it helps you have informed conversations with your engineering team about what customisation actually costs.

Related Terms

Model Fine-Tuning

The process of further training a pre-trained AI model on your own data so it performs better on your specific tasks.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Parameters

The total number of adjustable values in an AI model. A model with more parameters can capture more complex patterns but requires more computing power to train and run.

Model Weights

The numerical values inside a neural network that determine how it processes information. Weights are what the model learns during training — they encode its knowledge and capabilities.

Knowledge Distillation

A technique for training a smaller, faster AI model to replicate the behaviour of a larger, more capable model.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Fine-Tuning and Customisation Strategies

← Back to Glossary