Core AI

Data Poisoning

Last reviewed: April 2026

A type of attack where an adversary deliberately corrupts training data to make an AI model learn incorrect patterns or behaviours.

Data poisoning is an adversarial attack on the training process of an AI model. Instead of attacking the model at inference time, the attacker corrupts the training data so the model learns harmful, biased, or incorrect patterns from the start.

How data poisoning works

An attacker introduces malicious examples into a training dataset. These examples might be mislabelled (a photo of a cat labelled as a dog), subtly modified (adding an imperceptible watermark that the model associates with a particular class), or strategically crafted to create a backdoor (the model behaves normally on clean inputs but produces a specific wrong output when it sees a particular trigger pattern).

Types of data poisoning

Label flipping: Changing the labels on training examples. If enough "safe" emails are labelled as "spam" and vice versa, the spam filter learns the wrong associations.
Backdoor attacks: Inserting a hidden trigger pattern. The model learns to associate this trigger with a specific output. For example, a self-driving car model could be trained to misclassify stop signs that have a small specific sticker.
Influence attacks: Adding carefully crafted examples that shift the model's decision boundary so it misclassifies specific targets.
Model poisoning in federated learning: In distributed training setups where multiple parties contribute model updates, a malicious participant can send corrupted updates.

Why data poisoning is concerning

Modern AI models are trained on massive datasets scraped from the internet, making them difficult to audit. If someone plants malicious content on websites that will be crawled for training data, the poisoned examples may be incorporated without detection. The scale of modern training makes manual inspection impractical.

Defences against data poisoning

Data curation: Carefully vetting training data sources and filtering suspicious examples.
Anomaly detection: Identifying training examples that deviate significantly from expected patterns.
Robust training: Using training methods that are less sensitive to outliers and corrupted examples.
Provenance tracking: Maintaining records of where training data originated.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Data poisoning represents a fundamental vulnerability in AI systems because it corrupts the learning process itself. Understanding this threat is essential for anyone responsible for building or procuring AI systems, as it highlights the critical importance of training data quality and provenance.

Related Terms

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: AI Safety and Risk Management

← Back to Glossary