Core AI

Clustering

Last reviewed: April 2026

An unsupervised machine learning technique that groups similar data points together without predefined labels, revealing natural patterns in data.

Clustering is a type of unsupervised machine learning that automatically groups similar data points together. Unlike classification, where you tell the model what categories exist, clustering discovers the categories on its own by finding natural patterns in the data.

How clustering works

Clustering algorithms measure similarity between data points — typically using distance metrics — and group the most similar points together. The algorithm does not know what the groups represent; it only knows which data points resemble each other.

Common clustering algorithms

K-means — you specify the number of clusters (K), and the algorithm iteratively assigns each point to the nearest cluster centre. Simple, fast, and widely used.
Hierarchical clustering — builds a tree of nested clusters, allowing you to choose the level of granularity. Useful when you do not know how many clusters to expect.
DBSCAN — identifies clusters based on density, handling irregular shapes and automatically detecting outliers. Good for spatial data.
Gaussian Mixture Models — assumes data comes from a mix of probability distributions, allowing soft cluster assignments where a point can partially belong to multiple clusters.

Business applications

Customer segmentation — grouping customers by purchasing behaviour, engagement patterns, or demographics
Anomaly detection — identifying data points that do not fit any cluster, flagging potential fraud or errors
Document organisation — automatically grouping similar documents, emails, or support tickets
Market research — discovering natural segments in survey data without imposing predefined categories

Challenges

Choosing the right number of clusters is often subjective
Results depend heavily on how similarity is measured and which features are included
Clusters must be interpreted by humans — the algorithm groups data but does not explain why
High-dimensional data can make distance metrics unreliable

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Clustering reveals patterns in your data that you might never find manually. Customer segmentation, anomaly detection, and content organisation are high-value use cases accessible to most organisations. Understanding clustering helps you identify opportunities where unsupervised learning can deliver business insights without the cost of labelled training data.

Related Terms

Unsupervised Learning

A machine learning approach where the model finds patterns in data without being given correct answers. Used for discovering hidden structure, grouping similar items, and detecting anomalies.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Artificial Intelligence (AI)

Software that can perform tasks that normally require human intelligence, such as understanding language, recognising patterns, and making decisions.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary