Practical

Fine-Tuning

Last reviewed: April 2026

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Fine-tuning is the process of taking a pre-trained AI model and training it further on your own data to improve its performance on specific tasks. Think of it as specialised on-the-job training: the model already has broad capabilities from its original training, and fine-tuning sharpens those capabilities for your particular needs.

How fine-tuning works

A pre-trained model like GPT-4 or Claude has already learned general language understanding from trillions of words. Fine-tuning adds a second phase of training using a much smaller, task-specific dataset — typically hundreds to thousands of examples.

For instance, if you want an AI that writes in your company's specific communication style, you might fine-tune a model on 500 examples of approved company communications. The resulting model retains all its general capabilities but now naturally writes in your voice.

When to fine-tune (and when not to)

Fine-tuning makes sense when:

Consistent style or format: You need output that reliably matches a specific voice, structure, or template that is difficult to achieve with prompting alone.
Specialised terminology: Your domain uses technical language that general models handle inconsistently.
Classification tasks: You need the model to categorise inputs into your custom categories with high accuracy.
Efficiency: A fine-tuned smaller model can outperform a larger general model on your specific task, reducing cost and latency.

Fine-tuning does NOT make sense when:

Your data changes frequently: Fine-tuned knowledge is frozen at training time. Use RAG instead for dynamic data.
You need source citations: Fine-tuned models absorb information into their weights — they cannot point to specific source documents. Use RAG for this.
Prompt engineering has not been tried: Many tasks that seem to need fine-tuning can be solved with better prompts, system instructions, or few-shot examples.
You have limited data: Fine-tuning with too few examples can degrade model performance rather than improve it.

The fine-tuning process

Prepare data: Create a training dataset of input-output pairs. Quality matters more than quantity. 500 excellent examples typically outperform 5,000 mediocre ones.
Choose a base model: Start with the smallest model that handles your task well. Fine-tuning a smaller model is cheaper, faster, and often produces better results than fine-tuning a larger one.
Train: Upload your data to the provider's fine-tuning API (OpenAI, Anthropic, or others offer this as a service) or run training on your own infrastructure.
Evaluate: Test the fine-tuned model against a held-out set of examples. Compare performance to the base model with good prompting.
Iterate: Adjust your training data based on evaluation results. Fine-tuning is rarely one-and-done.

Cost and complexity

Fine-tuning costs vary significantly:

API-based fine-tuning (OpenAI, etc.): Relatively accessible. You pay for training compute and then per-token for inference. No ML expertise required.
Self-hosted fine-tuning (open-source models): More complex but gives you full control. Requires ML engineering capability and GPU infrastructure.
Full custom training: Building a model from scratch is prohibitively expensive for most organisations. Fine-tuning is the practical alternative.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Fine-tuning is often discussed as a must-have for enterprise AI, but in practice, most organisations should try RAG and prompt engineering first. Understanding when fine-tuning is genuinely necessary — versus when it is overkill — can save your organisation significant time and money. The most common mistake is fine-tuning too early, before simpler approaches have been exhausted.

Related Terms

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Related Comparisons

RAG vs Fine-Tuning

RAG (Retrieval-Augmented Generation) vs fine-tuning compared across setup complexity, cost, data freshness, accuracy, customisation depth, and maintenance.

Midjourney vs DALL-E 3

Midjourney and DALL-E 3 compared for image quality, prompt control, style, pricing, and practical use cases. Find the best AI image generator for your needs.

Anthropic vs OpenAI

Anthropic and OpenAI compared as AI companies and platforms — models, APIs, safety philosophy, developer experience, pricing, and enterprise features.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Brand Voice: Making AI Sound Like You

← Back to Glossary