Core AI

Active Learning

Last reviewed: April 2026

A machine learning approach where the model selectively requests labels for the most informative data points rather than learning from a randomly labelled dataset.

Active learning is a machine learning strategy where the model identifies which unlabelled data points would be most useful to learn from and asks a human annotator to label only those examples. This dramatically reduces the amount of labelled data needed to train an effective model.

The labelling bottleneck

Training supervised machine learning models requires labelled data — examples paired with correct answers. Labelling data is expensive and time-consuming. A medical imaging model might need thousands of X-rays labelled by radiologists. A document classification system might need tens of thousands of documents tagged by domain experts. Active learning addresses this bottleneck.

How active learning works

The process follows a cycle. First, the model is trained on a small initial set of labelled data. Then it examines a pool of unlabelled data and identifies the examples it is most uncertain about — the ones near its decision boundary. It presents these examples to a human annotator for labelling. The newly labelled examples are added to the training set, and the model retrains. This cycle repeats until the model reaches acceptable performance.

Selection strategies

Uncertainty sampling: The model picks examples where it is least confident about the correct label.
Query-by-committee: Multiple models vote on each example, and those with the most disagreement get selected.
Expected model change: The model selects examples that would most change its parameters if labelled.

Real-world applications

Active learning is widely used in domains where labelling is costly. Medical diagnosis, legal document review, content moderation, and manufacturing defect detection all benefit because expert annotators can focus their time on the most impactful examples.

Connection to modern AI

Reinforcement learning from human feedback (RLHF), used to train models like ChatGPT and Claude, shares principles with active learning. Humans provide feedback on selected model outputs, and the model improves from those targeted signals rather than requiring feedback on every possible response.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Active learning matters because labelled data is the most expensive ingredient in AI projects. Understanding this technique helps you plan realistic AI projects — you can build effective models with far less labelled data by strategically choosing what to annotate.

Related Terms

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Reinforcement Learning

A machine learning approach where an AI learns by trial and error, receiving rewards for good outcomes and penalties for bad ones. Used to train game-playing AI and to fine-tune LLMs.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Building Your Own AI Solutions

← Back to Glossary