Core AI

Upsampling

Last reviewed: April 2026

A technique that increases the representation of a minority class in a dataset to help AI models learn balanced patterns and avoid bias toward the majority class.

Upsampling is a technique used to address class imbalance in training data — when one category has far fewer examples than another. By increasing the representation of the minority class, upsampling helps models learn more balanced and accurate patterns.

The class imbalance problem

Real-world datasets are often heavily imbalanced:

In fraud detection, legitimate transactions outnumber fraudulent ones by 1000:1
In medical diagnosis, healthy patients outnumber those with a rare condition by 100:1
In manufacturing, good products outnumber defective ones by 50:1

A model trained on imbalanced data tends to predict the majority class almost every time. A fraud detection model might achieve 99.9 percent accuracy by simply labelling everything as legitimate — missing every fraudulent transaction.

Upsampling techniques

Random oversampling: Duplicate random examples from the minority class until the classes are balanced. Simple but can lead to overfitting since the model sees the same examples repeatedly.
SMOTE (Synthetic Minority Over-sampling Technique): Creates new synthetic examples by interpolating between existing minority class examples. Produces more diverse training data than simple duplication.
ADASYN: An adaptive version of SMOTE that generates more synthetic examples in regions where the minority class is harder to learn.

Upsampling vs downsampling

The opposite approach — downsampling — reduces the majority class to match the minority class. This avoids overfitting but discards potentially valuable data. The choice depends on your dataset size and the cost of misclassification.

In image processing

Upsampling has a different meaning in computer vision: increasing the resolution or spatial dimensions of an image or feature map. This is used in:

Image super-resolution (enhancing low-resolution images)
Semantic segmentation (producing pixel-level predictions)
Generative models (progressively increasing image detail)

Techniques include bilinear interpolation, transposed convolutions, and pixel shuffle.

Practical considerations

When addressing class imbalance:

Start by understanding the business cost of each type of error (missing fraud vs false alarm)
Try multiple approaches (upsampling, downsampling, class weights) and compare results
Evaluate using metrics that account for imbalance (precision, recall, F1, AUC-ROC) rather than accuracy
Monitor production performance, as real-world class ratios may differ from training data

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Class imbalance is one of the most common pitfalls in business AI projects. Understanding upsampling and related techniques helps you ensure AI models perform well on the cases that matter most — the rare events like fraud, defects, and at-risk customers that often drive the most business value.

Related Terms

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Supervised Learning

A machine learning approach where the model learns from labelled examples — input data paired with correct answers. The most common type of machine learning in business applications.

Classification

An AI task that assigns input to predefined categories. Spam detection, sentiment analysis, and image recognition are all classification tasks.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Regularization

A set of techniques that prevent AI models from memorising training data too closely, helping them perform better on new, unseen data.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How AI Models Learn and Generalise

← Back to Glossary