Upsampling
A technique that increases the representation of a minority class in a dataset to help AI models learn balanced patterns and avoid bias toward the majority class.
Upsampling is a technique used to address class imbalance in training data β when one category has far fewer examples than another. By increasing the representation of the minority class, upsampling helps models learn more balanced and accurate patterns.
The class imbalance problem
Real-world datasets are often heavily imbalanced:
- In fraud detection, legitimate transactions outnumber fraudulent ones by 1000:1
- In medical diagnosis, healthy patients outnumber those with a rare condition by 100:1
- In manufacturing, good products outnumber defective ones by 50:1
A model trained on imbalanced data tends to predict the majority class almost every time. A fraud detection model might achieve 99.9 percent accuracy by simply labelling everything as legitimate β missing every fraudulent transaction.
Upsampling techniques
- Random oversampling: Duplicate random examples from the minority class until the classes are balanced. Simple but can lead to overfitting since the model sees the same examples repeatedly.
- SMOTE (Synthetic Minority Over-sampling Technique): Creates new synthetic examples by interpolating between existing minority class examples. Produces more diverse training data than simple duplication.
- ADASYN: An adaptive version of SMOTE that generates more synthetic examples in regions where the minority class is harder to learn.
Upsampling vs downsampling
The opposite approach β downsampling β reduces the majority class to match the minority class. This avoids overfitting but discards potentially valuable data. The choice depends on your dataset size and the cost of misclassification.
In image processing
Upsampling has a different meaning in computer vision: increasing the resolution or spatial dimensions of an image or feature map. This is used in:
- Image super-resolution (enhancing low-resolution images)
- Semantic segmentation (producing pixel-level predictions)
- Generative models (progressively increasing image detail)
Techniques include bilinear interpolation, transposed convolutions, and pixel shuffle.
Practical considerations
When addressing class imbalance:
- Start by understanding the business cost of each type of error (missing fraud vs false alarm)
- Try multiple approaches (upsampling, downsampling, class weights) and compare results
- Evaluate using metrics that account for imbalance (precision, recall, F1, AUC-ROC) rather than accuracy
- Monitor production performance, as real-world class ratios may differ from training data
Why This Matters
Class imbalance is one of the most common pitfalls in business AI projects. Understanding upsampling and related techniques helps you ensure AI models perform well on the cases that matter most β the rare events like fraud, defects, and at-risk customers that often drive the most business value.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: How AI Models Learn and Generalise