Core AI

Label (Machine Learning)

Last reviewed: April 2026

The known correct answer attached to a training example in supervised learning, which the AI model learns to predict — such as 'spam' for an email or 'cat' for an image.

In machine learning, a label is the known correct answer associated with a piece of training data. It is the "right answer" that the model learns to predict. In an email spam filter, labels are "spam" or "not spam." In an image classifier, labels might be "cat," "dog," or "bird." In a churn prediction model, labels might be "churned" or "retained."

Labels in supervised learning

Labels are the foundation of supervised learning — the most common approach to building AI models. The process works like a teacher grading practice tests:

Collect data (emails, images, customer records)
Attach labels to each example (spam/not spam, cat/dog, churned/retained)
Train the model on the labelled data
The model learns to predict labels for new, unseen data

The quality of labels directly determines the quality of the model. A model trained on incorrectly labelled data will learn the wrong patterns and make unreliable predictions.

The labelling challenge

For many real-world applications, obtaining high-quality labels is the hardest and most expensive part of building an AI system. Consider these scenarios:

Medical imaging: Labelling X-rays as "healthy" or "disease present" requires expert radiologists — expensive and in short supply.
Sentiment analysis: Labelling customer reviews as positive, negative, or neutral requires judgement, and different people may disagree.
Object detection: Drawing bounding boxes around every object in thousands of images is tedious and time-consuming.

Label quality issues

Noisy labels: Some labels are simply wrong — a spam email accidentally labelled as legitimate, or a blurry image categorised incorrectly.
Ambiguous labels: Some examples genuinely fall between categories. Is a mildly critical review positive or negative?
Inconsistent labels: Different labellers apply different standards, creating contradictions in the dataset.
Label imbalance: When one category vastly outnumbers others (e.g., 99% of transactions are legitimate, 1% are fraud), the model may learn to always predict the majority class.

Approaches to efficient labelling

Active learning: The model identifies the examples it is most uncertain about and requests labels only for those, minimising the total labelling effort.
Semi-supervised learning: Using a small set of labelled examples alongside a large set of unlabelled data, letting the model leverage patterns in the unlabelled data.
Weak supervision: Using heuristic rules, noisy labellers, or existing databases to generate approximate labels at scale.
AI-assisted labelling: Using a pre-trained model to suggest labels that human reviewers then verify — faster than labelling from scratch.

Want to go deeper?

This topic is covered in our Essentials level. Access all 100+ lessons free.

Why This Matters

Labels are often the bottleneck in AI projects. Understanding the cost and complexity of labelling helps you estimate realistic timelines and budgets for AI initiatives, and appreciate why "we have lots of data" does not automatically mean you can build a good model — labelled data is what matters.

Related Terms

Supervised Learning

A machine learning approach where the model learns from labelled examples — input data paired with correct answers. The most common type of machine learning in business applications.

Data Labelling

The process of tagging raw data with informative labels so supervised machine learning models can learn the relationship between inputs and desired outputs.

Active Learning

A machine learning approach where the model selectively requests labels for the most informative data points rather than learning from a randomly labelled dataset.

Classification

An AI task that assigns input to predefined categories. Spam detection, sentiment analysis, and image recognition are all classification tasks.

Learn More

Continue learning in Essentials

This topic is covered in our lesson: The AI Landscape — Models, Tools, and Players

← Back to Glossary