Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Word Embedding

Last reviewed: April 2026

A technique that represents words as numerical vectors in a multi-dimensional space, where similar words are positioned close together.

A word embedding is a way of representing a word as a list of numbers β€” a vector β€” in a multi-dimensional space. Words with similar meanings end up close together in this space. "King" and "queen" are near each other. "Cat" and "dog" are near each other. "Cat" and "economics" are far apart.

Why words need to become numbers

AI models process numbers, not words. Before any language model can work with text, words must be converted into numerical representations. The simplest approach β€” assigning each word a unique number (cat=1, dog=2, king=3) β€” loses all information about meaning and relationships. Word embeddings solve this by mapping words into a space where geometric relationships capture semantic relationships.

The famous example

Word embeddings capture relationships so well that you can do arithmetic with meaning:

  • King - Man + Woman β‰ˆ Queen
  • Paris - France + Italy β‰ˆ Rome
  • Walking - Walk + Swim β‰ˆ Swimming

These vector arithmetic properties emerge naturally from training, without anyone programming them explicitly.

How embeddings are trained

The most influential methods include:

  • Word2Vec (2013, Google): Trains by predicting a word from its context (CBOW) or predicting context from a word (Skip-gram). Simple and effective.
  • GloVe (2014, Stanford): Learns from global word co-occurrence statistics across the entire corpus.
  • FastText (2016, Facebook): Extends Word2Vec by learning embeddings for sub-word units, handling rare words and morphology better.
  • Contextual embeddings (2018+, BERT and beyond): Produce different vectors for the same word depending on context. "Bank" in "river bank" and "bank account" gets different embeddings.

The evolution to contextual embeddings

Early word embeddings assigned one fixed vector per word regardless of context. This meant "bank" had the same representation whether it meant a financial institution or a riverbank. Modern language models produce contextual embeddings β€” representations that change based on surrounding text. This was a major step toward better language understanding.

Embeddings beyond words

The embedding concept extends far beyond words:

  • Sentence embeddings: Represent entire sentences or paragraphs as vectors
  • Image embeddings: Represent images as vectors (enabling image search by text)
  • Product embeddings: Represent products for recommendation systems
  • User embeddings: Represent user preferences for personalisation

Applications

  • Semantic search (finding similar documents by meaning)
  • Recommendation systems
  • Text classification
  • Machine translation
  • Clustering and topic modelling
Want to go deeper?
This topic is covered in our Practitioner level. Access all 60+ lessons free.

Why This Matters

Word embeddings are the foundation of how AI understands language. They explain why AI can find similar documents, power search engines, and understand that synonyms mean the same thing. Understanding embeddings helps you evaluate AI-powered search, recommendation, and classification tools.

Related Terms

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: How AI Understands Language