Core AI

Scaling Law

Last reviewed: April 2026

The empirical observation that AI model performance improves predictably as you increase model size, training data, and compute — following mathematical power laws.

Scaling laws are empirical observations that AI model performance improves in a predictable, mathematical relationship as you increase three factors: model size (number of parameters), amount of training data, and computational resources used for training.

The key discovery

In 2020, researchers at OpenAI published a paper showing that the performance of language models follows power-law relationships with scale. This means that if you plot model performance against model size (or data or compute) on a log-log graph, you get a straight line. Performance improves smoothly and predictably as you scale up.

This was transformative because it meant AI labs could predict how good a model would be before spending the resources to train it. It gave them a roadmap: if you want a model that is X% better, you need Y% more parameters, Z% more data, and W% more compute.

The three scaling axes

Parameters: More parameters means more capacity to learn patterns. But parameters alone are not enough — a huge model trained on insufficient data will underperform.
Training data: More data provides more patterns to learn from. But data alone is not enough — a small model cannot absorb the knowledge in a massive dataset.
Compute: More training compute (GPU hours) lets the model see more data and adjust its parameters more times. This is often the binding constraint.

Chinchilla scaling

A 2022 paper from DeepMind (the Chinchilla paper) refined scaling laws by showing that many models were over-parameterised and under-trained. For a given compute budget, it is better to train a smaller model on more data than a larger model on less data. This shifted the field toward training smaller, better-fed models.

Implications for the AI industry

Massive investment: Scaling laws justify the billions being spent on GPU clusters — there is mathematical evidence that more compute produces better models.
Diminishing returns: Performance improvements per dollar of compute are real but logarithmic. Each doubling of capability requires roughly 10x more resources.
Cost of frontier models: Training costs for leading models have grown from millions to hundreds of millions to potentially billions of dollars.

Beyond loss scaling

Recent research shows that scaling laws apply not just to training loss but to downstream task performance, though the relationship is less clean. Some capabilities (like reasoning) appear to emerge suddenly at certain scales rather than improving smoothly.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Scaling laws explain the AI arms race and the massive investments AI companies are making. Understanding them helps you appreciate why models keep getting better, why AI compute costs are so high, and why smaller, more efficient models (which bend the scaling curve) represent genuinely important breakthroughs.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Parameters

The total number of adjustable values in an AI model. A model with more parameters can capture more complex patterns but requires more computing power to train and run.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

GPU (Graphics Processing Unit)

A specialised processor originally designed for rendering graphics but now essential for training and running AI models. GPUs can perform thousands of calculations simultaneously.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Understanding Model Architectures

← Back to Glossary