Ground Truth
The verified, correct answer or label for a data point, used as the standard against which AI model predictions are measured.
Ground truth is the known correct answer for a piece of data. It is the benchmark against which you measure your AI model's predictions. Without ground truth, you cannot train supervised models or evaluate how well any model performs.
Where ground truth comes from
- Human annotation β experts or trained labellers manually classify, tag, or score each data point
- Verified records β historical outcomes that are known to be correct (did the customer actually churn? was the transaction actually fraudulent?)
- Sensor measurements β physical measurements from calibrated instruments (actual temperature, real GPS coordinates)
- Consensus β multiple annotators agree on the correct label, with disagreements resolved by experts
Ground truth in training
In supervised learning, every training example is a pair: the input and its ground truth label. The model learns by comparing its predictions to these ground truth labels and adjusting to reduce the gap.
Ground truth in evaluation
To evaluate a model, you compare its predictions against ground truth on a held-out test set that the model has never seen. This tells you how well the model will perform on new, unseen data.
The problem with imperfect ground truth
Ground truth is often messier than it sounds:
- Subjective tasks β reasonable people disagree on whether a review is positive or neutral. The ground truth reflects the annotator's judgement, not objective reality.
- Expensive to obtain β medical diagnoses may require specialist doctors; legal classifications may require lawyers
- Delayed β you may not know the ground truth for months (did the loan default? did the patient recover?)
- Noisy β annotation errors introduce incorrect ground truth labels, confusing the model during training
Ground truth and AI in production
Once a model is deployed, you need ongoing ground truth to monitor performance. Models degrade over time as the real world changes (data drift). Regular comparison against fresh ground truth lets you detect and address this degradation.
Why This Matters
Every AI evaluation depends on ground truth quality. If your ground truth is wrong, your accuracy metrics are meaningless β the model might be performing well but scoring poorly against incorrect labels, or appearing accurate while learning the wrong patterns. Investing in high-quality ground truth is the foundation of trustworthy AI.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Building Your First AI Workflow