Business

Data Drift

Last reviewed: April 2026

A gradual change in the statistical properties of the data an AI model encounters in production, compared to the data it was trained on.

Data drift occurs when the data a deployed AI model receives in the real world changes over time, diverging from the data it was originally trained on. This causes the model's performance to degrade, often silently, because the patterns it learned no longer accurately represent reality.

Why data drifts

The world changes. Customer preferences evolve, market conditions shift, new products launch, regulations change, and external events alter behaviour patterns. A model trained on pre-pandemic shopping data would perform poorly on pandemic-era shopping patterns. A fraud detection model trained on last year's tactics will miss this year's new fraud methods.

Types of drift

Feature drift (covariate shift): The distribution of input data changes. If a model was trained on data from urban customers and starts receiving rural customer data, the inputs look different even though the underlying relationships may be similar.
Concept drift: The relationship between inputs and outputs changes. What constituted "spam" five years ago is different from today's spam.
Label drift: The distribution of outcomes changes. If the economy shifts and default rates double, a credit model's assumptions about baseline risk become wrong.

Detecting data drift

Drift detection involves monitoring the statistical properties of incoming data and comparing them to the training data distribution. Common approaches include tracking feature distributions, monitoring prediction confidence, measuring accuracy against ground truth when available, and setting alerts for distribution divergence metrics.

Mitigating data drift

Regular retraining: Periodically retraining the model on recent data.
Online learning: Continuously updating the model as new data arrives.
Monitoring dashboards: Tracking model performance metrics and data distributions in real time.
Human review: Regularly auditing model outputs to catch quality degradation.

The business impact

Data drift is one of the most common reasons AI projects fail after initial deployment. A model that works brilliantly in testing can degrade within months in production. Understanding data drift is essential for maintaining AI systems over time.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Data drift is a leading cause of AI model failure in production. Organisations that do not monitor for drift risk relying on models that silently become less accurate, leading to poor decisions and eroded trust in AI systems.

Related Terms

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Maintaining AI Systems

← Back to Glossary