Practical

Embedding

Last reviewed: April 2026

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

An embedding is a way of representing text, images, or other data as a list of numbers — a vector — that captures the meaning of that content. Two pieces of text with similar meanings will have similar embeddings, even if they use completely different words.

The key insight

The breakthrough idea behind embeddings is that meaning can be represented mathematically. The sentence "The CEO approved the budget" and "The chief executive signed off on the financial plan" use entirely different words but have similar meanings. Their embeddings — the lists of numbers representing each sentence — would be very close to each other in mathematical space.

Conversely, "The bank was steep and muddy" and "The bank approved the loan" use the same word "bank" but have very different meanings. Their embeddings would be far apart.

How embeddings work

An embedding model converts text into a fixed-length vector — a list of, say, 1,536 numbers. Each number represents one dimension of meaning. Individually, these numbers are not interpretable, but collectively, they capture the semantic content of the text.

Think of it as coordinates. Just as GPS coordinates (latitude, longitude) represent a physical location in two dimensions, an embedding represents a meaning location in many dimensions. Similar meanings have nearby coordinates.

What embeddings enable

Embeddings power several critical AI capabilities:

Semantic search: Find documents by meaning, not just keywords. A search for "employee compensation" would also surface documents about "salary," "pay," "remuneration," and "wages."
RAG (Retrieval-Augmented Generation): When you ask an AI a question about your documents, embeddings are how the system finds the most relevant document chunks to include in the AI's context.
Recommendation systems: "People who read this article also liked..." works by finding articles with similar embeddings.
Clustering: Group similar documents, support tickets, or customer feedback automatically.
Anomaly detection: Identify content that is semantically unusual compared to a baseline.
Classification: Determine the category or topic of text by comparing its embedding to known category embeddings.

Embeddings in practice

A typical RAG workflow using embeddings:

Your 500-page employee handbook is split into paragraphs.
Each paragraph is converted to an embedding using an embedding model.
These embeddings are stored in a vector database.
An employee asks: "What's our parental leave policy?"
The question is converted to an embedding.
The vector database finds the paragraphs whose embeddings are most similar to the question embedding.
Those paragraphs are passed to the LLM along with the question.
The LLM generates an answer based on the actual policy text.

Embedding models

Embedding models are different from text generation models. They are designed specifically to produce high-quality numerical representations. Common embedding models include OpenAI's text-embedding-3 series, Cohere's embed models, and open-source options like BAAI's bge series.

Choosing embedding dimensions

Embedding models output vectors of different sizes (dimensions). Larger dimensions capture more nuance but require more storage and computation:

256 dimensions: Fast and lightweight, suitable for basic similarity
1,536 dimensions: Good balance of quality and efficiency (OpenAI's default)
3,072 dimensions: Higher quality, more storage

For most business applications, 1,536 dimensions provides excellent results without excessive cost.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Embeddings are the invisible infrastructure behind AI search and knowledge retrieval. If your organisation is considering building an AI-powered knowledge base, customer support system, or document search tool, embeddings are the technology that makes it work. Understanding them helps you evaluate vendors, ask the right questions about implementation, and set realistic expectations about semantic search quality.

Related Terms

Vector Database

A specialised database designed to store and search embeddings — the numerical representations of text, images, or other data used in AI applications.

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Natural Language Processing (NLP)

The branch of AI focused on enabling computers to understand, interpret, and generate human language in useful ways.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Context Windows: Why AI Forgets and How to Fix It

← Back to Glossary