Core AI

Encoder-Only Model

Last reviewed: April 2026

A transformer architecture that processes input text bidirectionally to produce rich representations, used for classification, search, and understanding tasks rather than text generation.

An encoder-only model is a transformer architecture that reads and processes text in both directions simultaneously, producing dense numerical representations (embeddings) that capture the meaning of the input. BERT is the most well-known example.

How encoder-only models work

Unlike decoder-only models that read left to right, encoder-only models use bidirectional attention — each token can attend to every other token in the input, both before and after it. This means the representation of any word is informed by the full surrounding context, making these models excellent at understanding text.

The BERT revolution

BERT (Bidirectional Encoder Representations from Transformers), released by Google in 2018, demonstrated that pre-training a bidirectional encoder on masked language modelling — hiding random words and predicting them from context — produced representations useful for almost any language understanding task. Fine-tuned BERT models set new records on question answering, sentiment analysis, and text classification.

What encoder-only models are good at

Text classification: Determining sentiment, topic, intent, or category of text.
Named entity recognition: Identifying and classifying names, dates, organisations, and other entities in text.
Semantic similarity: Determining how similar two pieces of text are in meaning.
Search and retrieval: Creating embeddings that power semantic search systems.
Token classification: Tasks like part-of-speech tagging or information extraction.

What they cannot do

Encoder-only models are not designed for text generation. They produce representations of input text but do not generate new text token by token. This is why chatbots and writing assistants use decoder-only models (GPT, Claude) rather than encoder-only models (BERT).

Encoder-only models today

While decoder-only LLMs dominate headlines, encoder-only models remain widely deployed in production. They power search engines, recommendation systems, spam filters, content moderation, and countless classification tasks. They are smaller, faster, and cheaper to run than large generative models, making them ideal when you need understanding rather than generation.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Encoder-only models power the search, classification, and embedding systems that run behind the scenes in most AI applications. Understanding the distinction between encoder and decoder architectures helps you choose the right tool for specific tasks.

Related Terms

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Neural Network

A computing system loosely inspired by the human brain, made of layers of interconnected nodes that learn to recognise patterns in data.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Understanding Model Architectures

← Back to Glossary