Core AI

Neural Scaling

Last reviewed: April 2026

The empirical observation that AI model performance improves predictably as model size, training data, and compute resources increase.

Neural scaling refers to the empirical laws governing how AI model performance improves as you increase model size (parameters), training data, and computational resources. These scaling laws, first rigorously characterised by OpenAI researchers in 2020, reveal predictable power-law relationships that guide how the AI industry invests billions in model development.

What the scaling laws say

The core finding is that model performance — measured by how well the model predicts the next token — improves as a smooth, predictable function of three variables: the number of parameters, the amount of training data, and the compute budget. Each variable follows a power law: doubling one variable produces a consistent improvement, with diminishing but never-zero returns.

Crucially, all three must scale together. A huge model trained on insufficient data underperforms, as does a small model trained on enormous data. The optimal allocation distributes compute across model size and data proportionally.

Chinchilla scaling

In 2022, DeepMind's "Chinchilla" paper refined the scaling laws, showing that many models were "over-parameterised" — they had too many parameters relative to their training data. The Chinchilla-optimal approach trains a somewhat smaller model on much more data, achieving better performance per unit of compute. This shifted the industry toward smaller, better-trained models.

Beyond loss to capabilities

While scaling laws describe smooth improvements in token prediction, real-world capabilities can appear more suddenly. A model might show no ability at arithmetic until it reaches a certain scale, then suddenly perform well. This creates tension between the smooth scaling of loss metrics and the potentially abrupt emergence of practically useful capabilities.

Implications for the industry

Scaling laws have driven the AI arms race. Because performance improvements are predictable, companies can forecast the capabilities of models they have not yet built. This predictability justifies massive infrastructure investments — if you know that 10x more compute will produce a meaningfully better model, the investment case becomes clear.

Limits of scaling

There is active debate about whether scaling laws will continue to hold. Potential limits include the finite supply of high-quality training data, the physical and economic limits of compute infrastructure, and the possibility of diminishing returns on real-world usefulness even as loss continues to improve.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Neural scaling laws explain why AI companies invest billions in larger models and more compute. Understanding them helps you anticipate how AI capabilities will evolve and why the race for computing resources is one of the defining dynamics of the AI industry.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Why Scale Matters in AI

← Back to Glossary

Neural Scaling

Last reviewed: April 2026

The empirical observation that AI model performance improves predictably as model size, training data, and compute resources increase.