Fine-Tuning
Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.
Fine-tuning is the process of taking a pre-trained AI model and training it further on your own data to improve its performance on specific tasks. Think of it as specialised on-the-job training: the model already has broad capabilities from its original training, and fine-tuning sharpens those capabilities for your particular needs.
How fine-tuning works
A pre-trained model like GPT-4 or Claude has already learned general language understanding from trillions of words. Fine-tuning adds a second phase of training using a much smaller, task-specific dataset — typically hundreds to thousands of examples.
For instance, if you want an AI that writes in your company's specific communication style, you might fine-tune a model on 500 examples of approved company communications. The resulting model retains all its general capabilities but now naturally writes in your voice.
When to fine-tune (and when not to)
Fine-tuning makes sense when:
- Consistent style or format: You need output that reliably matches a specific voice, structure, or template that is difficult to achieve with prompting alone.
- Specialised terminology: Your domain uses technical language that general models handle inconsistently.
- Classification tasks: You need the model to categorise inputs into your custom categories with high accuracy.
- Efficiency: A fine-tuned smaller model can outperform a larger general model on your specific task, reducing cost and latency.
Fine-tuning does NOT make sense when:
- Your data changes frequently: Fine-tuned knowledge is frozen at training time. Use RAG instead for dynamic data.
- You need source citations: Fine-tuned models absorb information into their weights — they cannot point to specific source documents. Use RAG for this.
- Prompt engineering has not been tried: Many tasks that seem to need fine-tuning can be solved with better prompts, system instructions, or few-shot examples.
- You have limited data: Fine-tuning with too few examples can degrade model performance rather than improve it.
The fine-tuning process
- Prepare data: Create a training dataset of input-output pairs. Quality matters more than quantity. 500 excellent examples typically outperform 5,000 mediocre ones.
- Choose a base model: Start with the smallest model that handles your task well. Fine-tuning a smaller model is cheaper, faster, and often produces better results than fine-tuning a larger one.
- Train: Upload your data to the provider's fine-tuning API (OpenAI, Anthropic, or others offer this as a service) or run training on your own infrastructure.
- Evaluate: Test the fine-tuned model against a held-out set of examples. Compare performance to the base model with good prompting.
- Iterate: Adjust your training data based on evaluation results. Fine-tuning is rarely one-and-done.
Cost and complexity
Fine-tuning costs vary significantly:
- API-based fine-tuning (OpenAI, etc.): Relatively accessible. You pay for training compute and then per-token for inference. No ML expertise required.
- Self-hosted fine-tuning (open-source models): More complex but gives you full control. Requires ML engineering capability and GPU infrastructure.
- Full custom training: Building a model from scratch is prohibitively expensive for most organisations. Fine-tuning is the practical alternative.
Why This Matters
Fine-tuning is often discussed as a must-have for enterprise AI, but in practice, most organisations should try RAG and prompt engineering first. Understanding when fine-tuning is genuinely necessary — versus when it is overkill — can save your organisation significant time and money. The most common mistake is fine-tuning too early, before simpler approaches have been exhausted.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: Brand Voice: Making AI Sound Like You