Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Scaling Law

Last reviewed: April 2026

The empirical observation that AI model performance improves predictably as you increase model size, training data, and compute β€” following mathematical power laws.

Scaling laws are empirical observations that AI model performance improves in a predictable, mathematical relationship as you increase three factors: model size (number of parameters), amount of training data, and computational resources used for training.

The key discovery

In 2020, researchers at OpenAI published a paper showing that the performance of language models follows power-law relationships with scale. This means that if you plot model performance against model size (or data or compute) on a log-log graph, you get a straight line. Performance improves smoothly and predictably as you scale up.

This was transformative because it meant AI labs could predict how good a model would be before spending the resources to train it. It gave them a roadmap: if you want a model that is X% better, you need Y% more parameters, Z% more data, and W% more compute.

The three scaling axes

  • Parameters: More parameters means more capacity to learn patterns. But parameters alone are not enough β€” a huge model trained on insufficient data will underperform.
  • Training data: More data provides more patterns to learn from. But data alone is not enough β€” a small model cannot absorb the knowledge in a massive dataset.
  • Compute: More training compute (GPU hours) lets the model see more data and adjust its parameters more times. This is often the binding constraint.

Chinchilla scaling

A 2022 paper from DeepMind (the Chinchilla paper) refined scaling laws by showing that many models were over-parameterised and under-trained. For a given compute budget, it is better to train a smaller model on more data than a larger model on less data. This shifted the field toward training smaller, better-fed models.

Implications for the AI industry

  • Massive investment: Scaling laws justify the billions being spent on GPU clusters β€” there is mathematical evidence that more compute produces better models.
  • Diminishing returns: Performance improvements per dollar of compute are real but logarithmic. Each doubling of capability requires roughly 10x more resources.
  • Cost of frontier models: Training costs for leading models have grown from millions to hundreds of millions to potentially billions of dollars.

Beyond loss scaling

Recent research shows that scaling laws apply not just to training loss but to downstream task performance, though the relationship is less clean. Some capabilities (like reasoning) appear to emerge suddenly at certain scales rather than improving smoothly.

Want to go deeper?
This topic is covered in our Advanced level. Access all 60+ lessons free.

Why This Matters

Scaling laws explain the AI arms race and the massive investments AI companies are making. Understanding them helps you appreciate why models keep getting better, why AI compute costs are so high, and why smaller, more efficient models (which bend the scaling curve) represent genuinely important breakthroughs.

Related Terms

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Understanding Model Architectures