Core AI

Chinchilla Scaling Laws

Last reviewed: April 2026

Research findings from DeepMind showing that AI models perform best when training data and model size are scaled proportionally, rather than simply making models as large as possible.

Chinchilla scaling laws are findings from a 2022 DeepMind research paper that fundamentally changed how AI labs think about training large language models. The key discovery: for a given compute budget, you get better performance by training a smaller model on more data than by training a larger model on less data.

The insight that changed everything

Before Chinchilla, the prevailing wisdom in AI was "bigger is better." OpenAI's GPT-3 had 175 billion parameters, and the race was on to build even larger models. DeepMind's research showed this approach was wasteful. Many large models were "undertrained" — they had enormous parameter counts but had not seen enough training data to fully utilise their capacity.

The Chinchilla paper demonstrated that a 70-billion parameter model trained on 1.4 trillion tokens could match or beat a 280-billion parameter model trained on 300 billion tokens — while being four times cheaper to run during inference.

The optimal ratio

The research suggested an approximately 1:20 ratio between model parameters and training tokens. A 10-billion parameter model should be trained on roughly 200 billion tokens. A 70-billion parameter model should see about 1.4 trillion tokens. This relationship, while approximate, provided a concrete formula for allocating compute budgets efficiently.

Impact on the industry

Chinchilla's findings had immediate practical consequences:

Smaller, smarter models: Labs shifted towards training more modestly sized models on more data. Meta's Llama 2 (70B parameters, 2 trillion tokens) and Mistral's models explicitly followed Chinchilla-optimal training strategies.
Inference cost reduction: Smaller models are cheaper and faster to deploy. A Chinchilla-optimal model delivers the same quality as a larger one at a fraction of the serving cost.
Data becomes the bottleneck: If models need vastly more training data to reach their potential, high-quality text data becomes the scarce resource — not compute power.

Beyond Chinchilla

More recent research has refined these findings. Some practitioners have found that training models well beyond the Chinchilla-optimal point — producing "over-trained" models — can be advantageous when inference cost is the primary concern. A model that is smaller but trained on even more data may be slightly less capable but dramatically cheaper to serve at scale.

Why scaling laws matter

Scaling laws are not merely academic curiosities. They determine how AI labs allocate billions of pounds in compute spending. They predict how much improvement to expect from the next generation of models. And they help businesses understand why smaller open-source models can sometimes match the performance of larger proprietary ones.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Chinchilla scaling explains why the AI industry shifted from building the biggest possible models to building more efficiently trained ones. Understanding this helps you evaluate model choices — a well-trained smaller model may outperform a poorly trained larger one, and will always be cheaper to run.

Related Terms

Scaling Law

The empirical observation that AI model performance improves predictably as you increase model size, training data, and compute — following mathematical power laws.

Parameters

The total number of adjustable values in an AI model. A model with more parameters can capture more complex patterns but requires more computing power to train and run.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: AI Infrastructure and Deployment

← Back to Glossary