Parameter-Efficient Fine-Tuning (PEFT)
A family of techniques that adapt large AI models to specific tasks by training only a small fraction of the model's parameters, dramatically reducing the cost and hardware requirements.
Parameter-efficient fine-tuning (PEFT) refers to a family of techniques that customise pre-trained AI models by updating only a small subset of their parameters rather than the entire model. This dramatically reduces the computational cost, memory requirements, and data needed for fine-tuning.
The problem PEFT solves
Fine-tuning a large language model β retraining it on specialised data to improve performance on specific tasks β traditionally requires updating all of the model's parameters. For a 70-billion parameter model, this means:
- Hundreds of gigabytes of GPU memory
- Specialised hardware costing thousands of pounds
- Hours or days of training time
- Risk of catastrophic forgetting of general capabilities
PEFT methods achieve comparable results while training only 0.1% to 2% of the model's parameters.
Major PEFT techniques
- LoRA (Low-Rank Adaptation): Adds small trainable matrices alongside the model's existing weight matrices. Only the new matrices are trained; the original weights are frozen. LoRA adapters are typically just 10-50 megabytes β tiny compared to the full model.
- QLoRA: Combines LoRA with quantisation, allowing fine-tuning on consumer GPUs by keeping the base model in 4-bit precision while training the LoRA adapters in higher precision.
- Prefix tuning: Adds a small number of trainable tokens (the "prefix") to the beginning of each layer's input. The model learns optimal prefix values for the target task.
- Adapters: Inserts small trainable modules between the layers of the frozen model. Each adapter has far fewer parameters than the layers it sits between.
- Prompt tuning: Similar to prefix tuning but operates only on the input embedding layer, learning a soft prompt that steers the model's behaviour.
Why PEFT is transformative
- Cost: Fine-tuning a 7B model with QLoRA costs a few pounds in cloud compute versus hundreds or thousands for full fine-tuning.
- Speed: PEFT training completes in hours rather than days.
- Storage: Multiple PEFT adapters can share a single base model, each adding only megabytes of storage. An organisation could have dozens of specialised models without duplicating the base model each time.
- Preservation: Because the base model's weights are frozen, its general knowledge and capabilities are preserved. Catastrophic forgetting is largely avoided.
Practical workflow
- Select a pre-trained base model appropriate for your domain.
- Prepare your instruction dataset (typically 1,000-10,000 high-quality examples).
- Apply PEFT fine-tuning (LoRA is the most popular starting point).
- Evaluate the fine-tuned model against your benchmarks.
- Deploy the base model with the PEFT adapter loaded on top.
When to use PEFT versus full fine-tuning
For the vast majority of enterprise use cases, PEFT is sufficient and preferable. Full fine-tuning is reserved for cases where you need to fundamentally alter the model's behaviour or have the compute budget and expertise to manage it safely.
Why This Matters
PEFT makes model customisation accessible to organisations that cannot afford the infrastructure required for full fine-tuning. Understanding these techniques helps you evaluate the realistic cost and feasibility of creating AI models tailored to your specific business needs.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: AI Infrastructure and Deployment
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β