Data Poisoning
A type of attack where an adversary deliberately corrupts training data to make an AI model learn incorrect patterns or behaviours.
Data poisoning is an adversarial attack on the training process of an AI model. Instead of attacking the model at inference time, the attacker corrupts the training data so the model learns harmful, biased, or incorrect patterns from the start.
How data poisoning works
An attacker introduces malicious examples into a training dataset. These examples might be mislabelled (a photo of a cat labelled as a dog), subtly modified (adding an imperceptible watermark that the model associates with a particular class), or strategically crafted to create a backdoor (the model behaves normally on clean inputs but produces a specific wrong output when it sees a particular trigger pattern).
Types of data poisoning
- Label flipping: Changing the labels on training examples. If enough "safe" emails are labelled as "spam" and vice versa, the spam filter learns the wrong associations.
- Backdoor attacks: Inserting a hidden trigger pattern. The model learns to associate this trigger with a specific output. For example, a self-driving car model could be trained to misclassify stop signs that have a small specific sticker.
- Influence attacks: Adding carefully crafted examples that shift the model's decision boundary so it misclassifies specific targets.
- Model poisoning in federated learning: In distributed training setups where multiple parties contribute model updates, a malicious participant can send corrupted updates.
Why data poisoning is concerning
Modern AI models are trained on massive datasets scraped from the internet, making them difficult to audit. If someone plants malicious content on websites that will be crawled for training data, the poisoned examples may be incorporated without detection. The scale of modern training makes manual inspection impractical.
Defences against data poisoning
- Data curation: Carefully vetting training data sources and filtering suspicious examples.
- Anomaly detection: Identifying training examples that deviate significantly from expected patterns.
- Robust training: Using training methods that are less sensitive to outliers and corrupted examples.
- Provenance tracking: Maintaining records of where training data originated.
Why This Matters
Data poisoning represents a fundamental vulnerability in AI systems because it corrupts the learning process itself. Understanding this threat is essential for anyone responsible for building or procuring AI systems, as it highlights the critical importance of training data quality and provenance.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: AI Safety and Risk Management