Label (Machine Learning)
The known correct answer attached to a training example in supervised learning, which the AI model learns to predict β such as 'spam' for an email or 'cat' for an image.
In machine learning, a label is the known correct answer associated with a piece of training data. It is the "right answer" that the model learns to predict. In an email spam filter, labels are "spam" or "not spam." In an image classifier, labels might be "cat," "dog," or "bird." In a churn prediction model, labels might be "churned" or "retained."
Labels in supervised learning
Labels are the foundation of supervised learning β the most common approach to building AI models. The process works like a teacher grading practice tests:
- Collect data (emails, images, customer records)
- Attach labels to each example (spam/not spam, cat/dog, churned/retained)
- Train the model on the labelled data
- The model learns to predict labels for new, unseen data
The quality of labels directly determines the quality of the model. A model trained on incorrectly labelled data will learn the wrong patterns and make unreliable predictions.
The labelling challenge
For many real-world applications, obtaining high-quality labels is the hardest and most expensive part of building an AI system. Consider these scenarios:
- Medical imaging: Labelling X-rays as "healthy" or "disease present" requires expert radiologists β expensive and in short supply.
- Sentiment analysis: Labelling customer reviews as positive, negative, or neutral requires judgement, and different people may disagree.
- Object detection: Drawing bounding boxes around every object in thousands of images is tedious and time-consuming.
Label quality issues
- Noisy labels: Some labels are simply wrong β a spam email accidentally labelled as legitimate, or a blurry image categorised incorrectly.
- Ambiguous labels: Some examples genuinely fall between categories. Is a mildly critical review positive or negative?
- Inconsistent labels: Different labellers apply different standards, creating contradictions in the dataset.
- Label imbalance: When one category vastly outnumbers others (e.g., 99% of transactions are legitimate, 1% are fraud), the model may learn to always predict the majority class.
Approaches to efficient labelling
- Active learning: The model identifies the examples it is most uncertain about and requests labels only for those, minimising the total labelling effort.
- Semi-supervised learning: Using a small set of labelled examples alongside a large set of unlabelled data, letting the model leverage patterns in the unlabelled data.
- Weak supervision: Using heuristic rules, noisy labellers, or existing databases to generate approximate labels at scale.
- AI-assisted labelling: Using a pre-trained model to suggest labels that human reviewers then verify β faster than labelling from scratch.
Why This Matters
Labels are often the bottleneck in AI projects. Understanding the cost and complexity of labelling helps you estimate realistic timelines and budgets for AI initiatives, and appreciate why "we have lots of data" does not automatically mean you can build a good model β labelled data is what matters.
Related Terms
Continue learning in Essentials
This topic is covered in our lesson: The AI Landscape β Models, Tools, and Players
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β