Learning Rate
A hyperparameter that controls how much an AI model adjusts its internal weights in response to each error during training β too high causes instability, too low causes slow learning.
The learning rate is arguably the single most important hyperparameter in training neural networks. It controls the size of the steps the model takes when adjusting its weights during training. Too large a learning rate causes the model to overshoot optimal solutions. Too small and the model learns painfully slowly or gets stuck.
How the learning rate works
During training, the model makes predictions, measures how wrong they are (using a loss function), and then adjusts its weights to reduce the error. The learning rate determines the magnitude of these adjustments:
- High learning rate (e.g., 0.1): Large weight adjustments. Training is fast but unstable β the model may bounce around and never converge on a good solution.
- Low learning rate (e.g., 0.00001): Tiny weight adjustments. Training is stable but extremely slow. The model might also get trapped in a poor local minimum.
- Good learning rate (e.g., 0.001): A balance between speed and stability. The model makes meaningful progress without overshooting.
The landscape analogy
Imagine you are hiking down a foggy mountain, trying to find the lowest valley. The learning rate is your step size:
- Giant steps get you downhill quickly but you might leap right over the valley and end up on another peak.
- Tiny steps ensure you never miss a valley but might take days to reach it β or you might get stuck in a small dip that is not actually the lowest point.
- Appropriately sized steps let you descend efficiently while still following the terrain.
Learning rate schedules
Modern training typically does not use a fixed learning rate. Instead, the rate changes during training according to a schedule:
- Warmup: Start with a very low learning rate and gradually increase it. This helps the model establish stable initial weights before making larger adjustments.
- Decay: Gradually reduce the learning rate during training. Large steps help in the early stages; smaller steps help fine-tune in the later stages.
- Cosine annealing: The learning rate follows a cosine curve, periodically increasing and decreasing. This can help escape poor local minima.
- Cyclical learning rates: The rate oscillates between upper and lower bounds, which research has shown can improve both training speed and final performance.
Finding the right learning rate
The learning rate range test β proposed by Leslie Smith β involves training for a few hundred steps while gradually increasing the learning rate from very small to very large. Plotting loss against learning rate reveals the sweet spot: the rate at which loss decreases most rapidly, just before it becomes unstable.
Practical impact
A wrong learning rate can make the difference between a model that learns in hours and one that never converges at all. When an AI training run produces unexpectedly poor results, the learning rate is one of the first things to check.
Why This Matters
The learning rate is the most frequently tuned setting in AI model training. Understanding it helps you appreciate why training AI models requires experimentation and expertise, and why identical architectures can produce wildly different results depending on how they are trained.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: AI Infrastructure and Deployment
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β