Core AI

Word Embedding

Last reviewed: April 2026

A technique that represents words as numerical vectors in a multi-dimensional space, where similar words are positioned close together.

A word embedding is a way of representing a word as a list of numbers — a vector — in a multi-dimensional space. Words with similar meanings end up close together in this space. "King" and "queen" are near each other. "Cat" and "dog" are near each other. "Cat" and "economics" are far apart.

Why words need to become numbers

AI models process numbers, not words. Before any language model can work with text, words must be converted into numerical representations. The simplest approach — assigning each word a unique number (cat=1, dog=2, king=3) — loses all information about meaning and relationships. Word embeddings solve this by mapping words into a space where geometric relationships capture semantic relationships.

The famous example

Word embeddings capture relationships so well that you can do arithmetic with meaning:

King - Man + Woman ≈ Queen
Paris - France + Italy ≈ Rome
Walking - Walk + Swim ≈ Swimming

These vector arithmetic properties emerge naturally from training, without anyone programming them explicitly.

How embeddings are trained

The most influential methods include:

Word2Vec (2013, Google): Trains by predicting a word from its context (CBOW) or predicting context from a word (Skip-gram). Simple and effective.
GloVe (2014, Stanford): Learns from global word co-occurrence statistics across the entire corpus.
FastText (2016, Facebook): Extends Word2Vec by learning embeddings for sub-word units, handling rare words and morphology better.
Contextual embeddings (2018+, BERT and beyond): Produce different vectors for the same word depending on context. "Bank" in "river bank" and "bank account" gets different embeddings.

The evolution to contextual embeddings

Early word embeddings assigned one fixed vector per word regardless of context. This meant "bank" had the same representation whether it meant a financial institution or a riverbank. Modern language models produce contextual embeddings — representations that change based on surrounding text. This was a major step toward better language understanding.

Embeddings beyond words

The embedding concept extends far beyond words:

Sentence embeddings: Represent entire sentences or paragraphs as vectors
Image embeddings: Represent images as vectors (enabling image search by text)
Product embeddings: Represent products for recommendation systems
User embeddings: Represent user preferences for personalisation

Applications

Semantic search (finding similar documents by meaning)
Recommendation systems
Text classification
Machine translation
Clustering and topic modelling

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Word embeddings are the foundation of how AI understands language. They explain why AI can find similar documents, power search engines, and understand that synonyms mean the same thing. Understanding embeddings helps you evaluate AI-powered search, recommendation, and classification tools.

Related Terms

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Vector Database

A specialised database designed to store and search embeddings — the numerical representations of text, images, or other data used in AI applications.

Natural Language Processing (NLP)

The branch of AI focused on enabling computers to understand, interpret, and generate human language in useful ways.

Semantic Search

Search that finds results based on meaning and intent rather than exact keyword matches. Powered by vector embeddings that represent concepts as numbers.

Tokenization

The process of breaking text into smaller units called tokens that an AI model can process, forming the fundamental input representation for language models.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: How AI Understands Language

← Back to Glossary