Core AI

Parameter-Efficient Fine-Tuning (PEFT)

Last reviewed: April 2026

A family of techniques that adapt large AI models to specific tasks by training only a small fraction of the model's parameters, dramatically reducing the cost and hardware requirements.

Parameter-efficient fine-tuning (PEFT) refers to a family of techniques that customise pre-trained AI models by updating only a small subset of their parameters rather than the entire model. This dramatically reduces the computational cost, memory requirements, and data needed for fine-tuning.

The problem PEFT solves

Fine-tuning a large language model — retraining it on specialised data to improve performance on specific tasks — traditionally requires updating all of the model's parameters. For a 70-billion parameter model, this means:

Hundreds of gigabytes of GPU memory
Specialised hardware costing thousands of pounds
Hours or days of training time
Risk of catastrophic forgetting of general capabilities

PEFT methods achieve comparable results while training only 0.1% to 2% of the model's parameters.

Major PEFT techniques

LoRA (Low-Rank Adaptation): Adds small trainable matrices alongside the model's existing weight matrices. Only the new matrices are trained; the original weights are frozen. LoRA adapters are typically just 10-50 megabytes — tiny compared to the full model.
QLoRA: Combines LoRA with quantisation, allowing fine-tuning on consumer GPUs by keeping the base model in 4-bit precision while training the LoRA adapters in higher precision.
Prefix tuning: Adds a small number of trainable tokens (the "prefix") to the beginning of each layer's input. The model learns optimal prefix values for the target task.
Adapters: Inserts small trainable modules between the layers of the frozen model. Each adapter has far fewer parameters than the layers it sits between.
Prompt tuning: Similar to prefix tuning but operates only on the input embedding layer, learning a soft prompt that steers the model's behaviour.

Why PEFT is transformative

Cost: Fine-tuning a 7B model with QLoRA costs a few pounds in cloud compute versus hundreds or thousands for full fine-tuning.
Speed: PEFT training completes in hours rather than days.
Storage: Multiple PEFT adapters can share a single base model, each adding only megabytes of storage. An organisation could have dozens of specialised models without duplicating the base model each time.
Preservation: Because the base model's weights are frozen, its general knowledge and capabilities are preserved. Catastrophic forgetting is largely avoided.

Practical workflow

Select a pre-trained base model appropriate for your domain.
Prepare your instruction dataset (typically 1,000-10,000 high-quality examples).
Apply PEFT fine-tuning (LoRA is the most popular starting point).
Evaluate the fine-tuned model against your benchmarks.
Deploy the base model with the PEFT adapter loaded on top.

When to use PEFT versus full fine-tuning

For the vast majority of enterprise use cases, PEFT is sufficient and preferable. Full fine-tuning is reserved for cases where you need to fundamentally alter the model's behaviour or have the compute budget and expertise to manage it safely.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

PEFT makes model customisation accessible to organisations that cannot afford the infrastructure required for full fine-tuning. Understanding these techniques helps you evaluate the realistic cost and feasibility of creating AI models tailored to your specific business needs.

Related Terms

LoRA (Low-Rank Adaptation)

A fine-tuning technique that trains only a small number of additional parameters instead of updating the entire model, making customisation faster and cheaper.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Foundation Model

A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks — GPT, Claude, and Llama are all foundation models.

Parameters

The total number of adjustable values in an AI model. A model with more parameters can capture more complex patterns but requires more computing power to train and run.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: AI Infrastructure and Deployment

← Back to Glossary