Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Data Augmentation

Last reviewed: April 2026

Techniques for artificially expanding a training dataset by creating modified versions of existing data, improving model performance without collecting new data.

Data augmentation is a set of techniques for increasing the size and diversity of a training dataset by creating modified versions of existing data. It is one of the most cost-effective ways to improve model performance.

Why augmentation matters

AI models learn better from more data. But collecting and annotating new data is expensive and slow. Augmentation lets you multiply your existing data by creating plausible variations, giving the model more examples to learn from without the cost of new data collection.

Image augmentation techniques

  • Geometric transformations β€” rotating, flipping, cropping, or scaling images
  • Colour adjustments β€” changing brightness, contrast, saturation, or adding colour jitter
  • Noise injection β€” adding random noise to make the model robust to imperfect inputs
  • Cutout and mixup β€” masking portions of images or blending two images together

Text augmentation techniques

  • Synonym replacement β€” swapping words with synonyms while preserving meaning
  • Back-translation β€” translating text to another language and back, producing natural paraphrases
  • Random insertion, deletion, or swap β€” minor perturbations that teach robustness
  • LLM-based augmentation β€” using a language model to generate paraphrases or entirely new examples in the same style

Audio augmentation

  • Adding background noise, changing speed or pitch, time-shifting β€” making speech recognition models robust to real-world conditions

Best practices

  • Augmented data should be plausible β€” extreme distortions can hurt rather than help
  • Augmentation should preserve labels β€” a horizontally flipped cat is still a cat, but a horizontally flipped "6" might look like a "9"
  • Use augmentation to address class imbalance by generating more examples of underrepresented categories
  • Validate that augmentation actually improves performance on a held-out test set
Want to go deeper?
This topic is covered in our Practitioner level. Access all 60+ lessons free.

Why This Matters

Data augmentation can save your organisation significant time and money in AI projects. Instead of spending months collecting more training data, strategic augmentation can achieve comparable improvements at a fraction of the cost. It is particularly valuable for specialised domains where labelled data is scarce.

Related Terms

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow