Core AI

Early Stopping

Last reviewed: April 2026

A training technique where model training is halted before completion when performance on validation data stops improving, preventing the model from overfitting to training data.

Early stopping is one of the simplest and most effective techniques for preventing overfitting in machine learning. The idea is straightforward: monitor the model's performance on a validation set during training, and stop training when that performance stops improving — even if the model could still improve on the training data.

The overfitting curve

During training, a model's performance on the training data typically improves continuously — given enough time, it will memorise the training set perfectly. But performance on unseen data follows a different curve: it improves initially, reaches a peak, and then starts to decline as the model begins memorising rather than generalising.

Early stopping identifies this inflection point and halts training there, capturing the model at its most generalisable state.

How it works in practice

Split your data into training, validation, and test sets.
Train the model on the training set.
After each training epoch (pass through the full training data), evaluate performance on the validation set.
Track whether validation performance is improving.
If validation performance has not improved for a specified number of epochs (the "patience" parameter), stop training.
Restore the model weights from the epoch with the best validation performance.

The patience parameter

Patience determines how many epochs of no improvement to tolerate before stopping. Too little patience (e.g., 1 epoch) may stop training prematurely during a temporary plateau. Too much patience wastes compute on training that is not helping. Typical values range from 5 to 20 epochs, depending on the task.

Advantages of early stopping

Simplicity: It requires no changes to the model architecture or training procedure — just a monitoring loop and a stopping criterion.
Computational savings: Training stops when it is no longer productive, saving time and compute costs.
No hyperparameter tuning: Unlike other regularisation methods (dropout rate, weight decay strength), early stopping requires minimal tuning.
Universal applicability: It works with any model type and any training procedure.

Limitations

Early stopping assumes that a single validation metric captures what you care about. In practice, different metrics may peak at different times.
The choice of validation set matters — a poorly constructed validation set can lead to misleading stopping decisions.
For very expensive training runs (like pre-training a large language model), early stopping decisions have enormous financial implications.

Relation to other regularisation methods

Early stopping is often used alongside other techniques like dropout, weight decay, and data augmentation. These approaches are complementary — they address overfitting through different mechanisms and combining them typically produces better results than any single technique alone.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Early stopping is a reminder that more training is not always better. For organisations investing in AI model development, understanding when to stop training saves compute costs and produces models that actually work well on new data rather than just performing brilliantly on the data they have already seen.

Related Terms

Overfitting

When an AI model performs excellently on training data but poorly on new data because it has memorised specific examples rather than learning general patterns.

Validation Set

A portion of data held back from training and used to evaluate an AI model's performance during development, helping prevent overfitting.

Regularization

A set of techniques that prevent AI models from memorising training data too closely, helping them perform better on new, unseen data.

Epoch

One complete pass through the entire training dataset during model training — most models require many epochs to learn effectively.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary