Core AI

Adversarial Examples

Last reviewed: April 2026

Inputs deliberately crafted to fool an AI model into making incorrect predictions, often by making tiny changes that are imperceptible to humans but catastrophic for the model.

Adversarial examples are inputs intentionally designed to cause an AI model to make mistakes. The most striking examples come from computer vision: adding carefully calculated noise to an image — invisible to the human eye — can cause a model to confidently classify a cat as a toaster, or a stop sign as a speed limit sign.

How adversarial examples work

AI models learn decision boundaries — mathematical surfaces that separate one category from another. Adversarial examples are crafted to push an input just across one of these boundaries. The change is tiny in human-perceptible terms but mathematically significant to the model.

For image classifiers, this typically involves:

Computing the gradient — the direction in which small pixel changes would most affect the model's output
Applying a tiny perturbation in that direction
The modified image looks identical to humans but fools the model completely

For text models, adversarial examples might involve swapping characters, inserting invisible Unicode characters, or rephrasing text in ways that preserve meaning for humans but confuse the model.

Why this matters for real-world systems

Adversarial examples are not merely academic curiosities. They have practical security implications:

Autonomous vehicles: Researchers have demonstrated that stickers placed on stop signs can cause self-driving car systems to misread them.
Content moderation: Adversarial text modifications can bypass automated filters designed to catch hate speech or misinformation.
Fraud detection: Subtle modifications to transaction data could evade AI-based fraud detection systems.
Biometric security: Adversarial patterns on clothing or accessories can fool facial recognition systems.

Defences against adversarial attacks

Adversarial training: Including adversarial examples in the training data so the model learns to handle them. This is the most effective but computationally expensive approach.
Input preprocessing: Applying transformations (compression, smoothing) to inputs before feeding them to the model, which can destroy adversarial perturbations.
Ensemble methods: Using multiple models and requiring consensus, since adversarial examples crafted for one model rarely transfer perfectly to others.
Certified defences: Mathematical guarantees that the model's prediction will not change for perturbations below a certain size.

The broader lesson

Adversarial examples reveal a fundamental truth about AI: models do not see the world the way humans do. They rely on statistical patterns that may be entirely different from the features humans consider important. A model might classify images of dogs partly based on background grass, not the dog itself. Adversarial examples exploit these misaligned features.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Adversarial examples highlight the gap between AI confidence and AI reliability. For any business deploying AI in security-sensitive contexts — authentication, fraud detection, content moderation — understanding these vulnerabilities is essential for setting appropriate trust levels and implementing defensive measures.

Related Terms

Computer Vision

The field of AI that enables machines to interpret and understand visual information from images and videos, including object recognition, scene understanding, and visual analysis.

Classification

An AI task that assigns input to predefined categories. Spam detection, sentiment analysis, and image recognition are all classification tasks.

AI Safety

The field of research and practice dedicated to ensuring AI systems behave as intended and do not cause unintended harm.

Red Teaming (AI)

Systematically testing an AI system by trying to make it fail, produce harmful output, or violate its guidelines — to find and fix vulnerabilities before users do.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: AI Safety and Responsible Deployment

← Back to Glossary