Regularization
A set of techniques that prevent AI models from memorising training data too closely, helping them perform better on new, unseen data.
Regularization is any technique that prevents a machine learning model from fitting too closely to its training data β a problem called overfitting. An overfitted model performs brilliantly on data it has seen but poorly on new data, which defeats the purpose of building a predictive model.
Why overfitting happens
Machine learning models are pattern finders. Given enough capacity, a model will find patterns in everything β including the noise and random quirks that are specific to the training dataset. A model predicting house prices might memorise that the three most expensive houses in the training data were all painted blue, and then incorrectly learn that blue paint increases value.
Regularization forces the model to learn general patterns rather than memorising specifics.
Common regularization techniques
- L1 regularization (Lasso): Adds a penalty based on the absolute size of the model's weights. This encourages the model to set unimportant weights to exactly zero, effectively performing feature selection.
- L2 regularization (Ridge): Adds a penalty based on the squared size of weights. This encourages smaller weights overall, preventing any single feature from dominating predictions.
- Dropout: Used in neural networks. During training, random neurons are temporarily switched off, forcing the network to learn redundant representations rather than relying on specific pathways.
- Early stopping: Monitor the model's performance on a validation set during training and stop when performance begins to degrade, even if training loss is still improving.
- Data augmentation: Artificially expand the training dataset by creating modified versions of existing examples (rotating images, adding noise, paraphrasing text).
How to tell if you need regularization
The classic sign of overfitting is a large gap between training performance and validation performance. If your model achieves 98 percent accuracy on training data but only 75 percent on new data, it has memorised rather than learned.
Regularization in large language models
LLMs use regularization extensively during training. Dropout is common in transformer architectures. The massive scale of training data itself acts as a form of regularization β with billions of diverse examples, the model is less likely to memorise quirks from any single source.
The underfitting counterpart
Too much regularization causes the opposite problem β underfitting β where the model is too constrained to learn the real patterns in the data. Good model development involves finding the right balance.
Why This Matters
Regularization is what separates a model that works in the lab from one that works in production. When evaluating AI solutions or reviewing model performance reports, understanding regularization helps you ask the right questions about whether a model will generalise to your real-world data or merely perform well on test benchmarks.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: How AI Models Learn and Generalise