Core AI

Learning Rate

Last reviewed: April 2026

A hyperparameter that controls how much an AI model adjusts its internal weights in response to each error during training — too high causes instability, too low causes slow learning.

The learning rate is arguably the single most important hyperparameter in training neural networks. It controls the size of the steps the model takes when adjusting its weights during training. Too large a learning rate causes the model to overshoot optimal solutions. Too small and the model learns painfully slowly or gets stuck.

How the learning rate works

During training, the model makes predictions, measures how wrong they are (using a loss function), and then adjusts its weights to reduce the error. The learning rate determines the magnitude of these adjustments:

High learning rate (e.g., 0.1): Large weight adjustments. Training is fast but unstable — the model may bounce around and never converge on a good solution.
Low learning rate (e.g., 0.00001): Tiny weight adjustments. Training is stable but extremely slow. The model might also get trapped in a poor local minimum.
Good learning rate (e.g., 0.001): A balance between speed and stability. The model makes meaningful progress without overshooting.

The landscape analogy

Imagine you are hiking down a foggy mountain, trying to find the lowest valley. The learning rate is your step size:

Giant steps get you downhill quickly but you might leap right over the valley and end up on another peak.
Tiny steps ensure you never miss a valley but might take days to reach it — or you might get stuck in a small dip that is not actually the lowest point.
Appropriately sized steps let you descend efficiently while still following the terrain.

Learning rate schedules

Modern training typically does not use a fixed learning rate. Instead, the rate changes during training according to a schedule:

Warmup: Start with a very low learning rate and gradually increase it. This helps the model establish stable initial weights before making larger adjustments.
Decay: Gradually reduce the learning rate during training. Large steps help in the early stages; smaller steps help fine-tune in the later stages.
Cosine annealing: The learning rate follows a cosine curve, periodically increasing and decreasing. This can help escape poor local minima.
Cyclical learning rates: The rate oscillates between upper and lower bounds, which research has shown can improve both training speed and final performance.

Finding the right learning rate

The learning rate range test — proposed by Leslie Smith — involves training for a few hundred steps while gradually increasing the learning rate from very small to very large. Plotting loss against learning rate reveals the sweet spot: the rate at which loss decreases most rapidly, just before it becomes unstable.

Practical impact

A wrong learning rate can make the difference between a model that learns in hours and one that never converges at all. When an AI training run produces unexpectedly poor results, the learning rate is one of the first things to check.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

The learning rate is the most frequently tuned setting in AI model training. Understanding it helps you appreciate why training AI models requires experimentation and expertise, and why identical architectures can produce wildly different results depending on how they are trained.

Related Terms

Hyperparameter

A configuration setting chosen before training begins — like learning rate, batch size, or number of layers — that controls how a model learns rather than what it learns.

Gradient Descent

The optimisation algorithm that trains neural networks by iteratively adjusting model parameters in the direction that reduces prediction errors.

Loss Function

A mathematical function that measures how far a model's predictions are from the correct answers, providing the error signal that drives learning during training.

Neural Network

A computing system loosely inspired by the human brain, made of layers of interconnected nodes that learn to recognise patterns in data.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: AI Infrastructure and Deployment

← Back to Glossary