Core AI

Cosine Similarity

Last reviewed: April 2026

A mathematical measure of how similar two vectors are by calculating the cosine of the angle between them, widely used in AI to compare documents, images, and search queries.

Cosine similarity is a mathematical measure that quantifies how similar two vectors are by computing the cosine of the angle between them. In AI, it is the standard method for comparing embeddings — the numerical representations that models use to encode the meaning of text, images, and other data.

How it works

Two vectors can point in similar or different directions. Cosine similarity measures this:

A cosine similarity of 1 means the vectors point in exactly the same direction — the items are maximally similar.
A cosine similarity of 0 means the vectors are perpendicular — the items are unrelated.
A cosine similarity of -1 means the vectors point in opposite directions — the items are maximally dissimilar.

The beauty of cosine similarity is that it ignores magnitude (the length of the vectors) and focuses purely on direction. A short document and a long document about the same topic will have similar directions even though their magnitudes differ.

Why AI uses cosine similarity

When an AI model converts text into an embedding vector, semantically similar texts end up with similar vectors. Cosine similarity provides a fast, efficient way to measure that similarity numerically.

Common applications include:

Semantic search: When you search a knowledge base, your query is embedded into a vector, and cosine similarity finds the documents with the most similar vectors.
Recommendation systems: Finding products, articles, or content similar to what a user has previously engaged with.
Duplicate detection: Identifying near-duplicate documents, support tickets, or customer enquiries.
Clustering: Grouping similar items together based on their vector representations.
RAG (Retrieval Augmented Generation): The retrieval step in RAG typically uses cosine similarity to find the most relevant documents to include in the AI's context.

Cosine similarity versus other distance metrics

Euclidean distance measures the straight-line distance between two points. It is sensitive to magnitude, which can be a disadvantage when comparing documents of different lengths.
Dot product is faster to compute but conflates similarity with magnitude. Useful when magnitude carries meaning (e.g., document importance).
Cosine similarity is generally preferred for text and semantic comparisons because normalising for magnitude usually produces more meaningful similarity scores.

Practical considerations

When building semantic search or recommendation systems, the choice of similarity metric can significantly affect results. Cosine similarity is the default choice for most text-based applications, but the best metric depends on how the embeddings were trained. Always check the model documentation for the recommended similarity measure.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Cosine similarity is the engine behind semantic search, document matching, and AI-powered recommendations. Understanding it helps you evaluate and troubleshoot these systems — when search results seem off, the issue often lies in how similarity is being measured.

Related Terms

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Semantic Search

Search that finds results based on meaning and intent rather than exact keyword matches. Powered by vector embeddings that represent concepts as numbers.

Vector Database

A specialised database designed to store and search embeddings — the numerical representations of text, images, or other data used in AI applications.

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary