Core AI

Dimensionality Reduction

Last reviewed: April 2026

Techniques that reduce the number of features in a dataset while preserving the most important patterns, making data easier to visualise, process, and model.

Dimensionality reduction is the process of reducing the number of features (variables, columns) in a dataset while retaining as much useful information as possible. It is essential when dealing with high-dimensional data — datasets with hundreds or thousands of features.

Why reduce dimensions

Visualisation — humans can see two or three dimensions. Reducing a dataset to two dimensions lets you plot and visually inspect clusters, outliers, and patterns.
Performance — many algorithms slow down dramatically or break entirely with too many features. Fewer dimensions means faster training and prediction.
Noise reduction — some features contain more noise than signal. Removing them improves model performance.
Curse of dimensionality — as dimensions increase, data becomes increasingly sparse. Models need exponentially more data to perform well in high-dimensional spaces.

Common techniques

PCA (Principal Component Analysis) — finds the directions of maximum variance in the data and projects it onto those axes. The most widely used linear technique.
t-SNE — a non-linear technique that excels at visualisation, preserving local structure. Popular for exploring clusters in high-dimensional data.
UMAP — similar to t-SNE but faster and better at preserving global structure. Increasingly preferred for both visualisation and preprocessing.
Autoencoders — neural networks that learn a compressed representation. More flexible than linear methods but harder to interpret.

Feature selection vs. feature extraction

Feature selection keeps a subset of original features, discarding the rest. You can still interpret what each remaining feature means.
Feature extraction (PCA, t-SNE, UMAP) creates new features that combine the originals. The new features are more compact but harder to interpret.

Practical considerations

Always apply dimensionality reduction on the training set and then apply the same transformation to the test set. Fitting on the test set causes data leakage and inflated performance estimates.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Dimensionality reduction is a practical tool that makes AI projects feasible when you have wide datasets with many features. It improves model performance, reduces computing costs, and enables visual exploration of complex data. Understanding it helps you recognise when a project is struggling with too many features rather than too little data.

Related Terms

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Unsupervised Learning

A machine learning approach where the model finds patterns in data without being given correct answers. Used for discovering hidden structure, grouping similar items, and detecting anomalies.

Autoencoder

A neural network that learns to compress data into a smaller representation and then reconstruct it, useful for dimensionality reduction, denoising, and anomaly detection.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How LLMs Actually Work

← Back to Glossary