Curriculum Learning
A training strategy where a model learns from easy examples first and progressively encounters more difficult ones, mirroring how humans learn.
Curriculum learning is a machine learning training strategy inspired by human education. Instead of presenting training data in random order, examples are organised from simple to complex, allowing the model to build foundational understanding before tackling harder cases.
The intuition
Humans do not learn calculus before arithmetic. We start with counting, move to addition, then multiplication, and gradually build toward advanced mathematics. Each stage provides the foundation for the next. Curriculum learning applies this principle to model training.
How it works
The training process has three key components. First, a difficulty measure determines how "easy" or "hard" each training example is β this might be based on sentence length, label ambiguity, or loss value from an initial model pass. Second, a pacing function determines how quickly to introduce harder examples. Third, the training loop feeds examples to the model in order of increasing difficulty.
Benefits of curriculum learning
- Faster convergence: Models often reach good performance more quickly because they build stable foundations before encountering confusing edge cases.
- Better final performance: On some tasks, curriculum learning produces models that generalise better than those trained on randomly ordered data.
- Improved stability: Training is less likely to diverge or get stuck in poor local minima.
Challenges
Defining "difficulty" is not always straightforward. What is easy for a model may not align with human intuition. The pacing function also requires tuning β introducing hard examples too slowly wastes training time, while introducing them too quickly loses the benefit of curriculum ordering.
Curriculum learning in LLM training
Large language model training incorporates curriculum-like ideas. Pre-training often starts with cleaner, more structured text before introducing noisier web data. Fine-tuning stages are themselves a form of curriculum β the model first learns general language patterns, then specific task behaviours, then safety constraints.
Anti-curriculum learning
Interestingly, some research shows that starting with the hardest examples can also be effective for certain tasks, a strategy called anti-curriculum or self-paced hard-example mining. The optimal approach depends on the specific task and dataset.
Why This Matters
Curriculum learning illustrates an important principle: the order in which an AI model encounters information affects what it learns. This concept helps you understand why training strategies matter and why the same data can produce very different models depending on how it is presented.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: Understanding Model Training