Core AI

Contrastive Learning

Last reviewed: April 2026

A training approach where a model learns to represent data by distinguishing similar examples from dissimilar ones.

Contrastive learning is a machine learning technique where a model learns useful representations of data by being trained to pull similar items closer together and push dissimilar items apart in its internal representation space.

The core idea

Imagine teaching someone to recognise dog breeds without labels. You show them two photos of golden retrievers and one of a poodle, and say "these two are more similar to each other than to this one." Over many such comparisons, they develop an understanding of what makes breeds different — fur texture, body shape, size — without ever being told the breed names.

Contrastive learning works the same way. The model learns a representation space where similar items cluster together and dissimilar items are far apart.

How it works

The training process creates pairs or groups of examples. Positive pairs are examples that should be similar — two augmented versions of the same image, or a sentence and its paraphrase. Negative pairs are examples that should be different. The model is trained to produce similar representations for positive pairs and different representations for negative pairs.

The loss function (the mathematical measure of error) penalises the model when similar items are represented as far apart or when dissimilar items are represented as close together.

Key applications

Embedding models: The sentence and text embedding models used in semantic search and RAG are often trained using contrastive learning. They learn to represent similar texts with similar vectors.
Image recognition: Models like CLIP learn to associate images with their text descriptions through contrastive learning.
Self-supervised learning: Contrastive learning enables training on unlabelled data, since the model creates its own supervision signal from data augmentation.

Why it matters for modern AI

Contrastive learning powers the embedding models that make semantic search, retrieval-augmented generation, and recommendation systems work. When you search for "how to fix a leaking tap" and find results about "repairing a dripping faucet," contrastive learning is what taught the embedding model that these phrases mean the same thing.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Contrastive learning underpins the embedding models used in semantic search, RAG systems, and recommendation engines. Understanding it helps you appreciate why these systems can find conceptually related content even when the exact words differ.

Related Terms

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Training Data

The dataset used to teach an AI model. The quality, size, and composition of training data directly determines what the AI can and cannot do well.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Understanding Model Training

← Back to Glossary