Core AI

Regularization

Last reviewed: April 2026

A set of techniques that prevent AI models from memorising training data too closely, helping them perform better on new, unseen data.

Regularization is any technique that prevents a machine learning model from fitting too closely to its training data — a problem called overfitting. An overfitted model performs brilliantly on data it has seen but poorly on new data, which defeats the purpose of building a predictive model.

Why overfitting happens

Machine learning models are pattern finders. Given enough capacity, a model will find patterns in everything — including the noise and random quirks that are specific to the training dataset. A model predicting house prices might memorise that the three most expensive houses in the training data were all painted blue, and then incorrectly learn that blue paint increases value.

Regularization forces the model to learn general patterns rather than memorising specifics.

Common regularization techniques

L1 regularization (Lasso): Adds a penalty based on the absolute size of the model's weights. This encourages the model to set unimportant weights to exactly zero, effectively performing feature selection.
L2 regularization (Ridge): Adds a penalty based on the squared size of weights. This encourages smaller weights overall, preventing any single feature from dominating predictions.
Dropout: Used in neural networks. During training, random neurons are temporarily switched off, forcing the network to learn redundant representations rather than relying on specific pathways.
Early stopping: Monitor the model's performance on a validation set during training and stop when performance begins to degrade, even if training loss is still improving.
Data augmentation: Artificially expand the training dataset by creating modified versions of existing examples (rotating images, adding noise, paraphrasing text).

How to tell if you need regularization

The classic sign of overfitting is a large gap between training performance and validation performance. If your model achieves 98 percent accuracy on training data but only 75 percent on new data, it has memorised rather than learned.

Regularization in large language models

LLMs use regularization extensively during training. Dropout is common in transformer architectures. The massive scale of training data itself acts as a form of regularization — with billions of diverse examples, the model is less likely to memorise quirks from any single source.

The underfitting counterpart

Too much regularization causes the opposite problem — underfitting — where the model is too constrained to learn the real patterns in the data. Good model development involves finding the right balance.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Regularization is what separates a model that works in the lab from one that works in production. When evaluating AI solutions or reviewing model performance reports, understanding regularization helps you ask the right questions about whether a model will generalise to your real-world data or merely perform well on test benchmarks.

Related Terms

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Underfitting

When an AI model is too simple to capture the patterns in the data, resulting in poor performance on both training data and new data.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Validation Set

A portion of data held back from training and used to evaluate an AI model's performance during development, helping prevent overfitting.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How AI Models Learn and Generalise

← Back to Glossary