Encoder-Only Model
A transformer architecture that processes input text bidirectionally to produce rich representations, used for classification, search, and understanding tasks rather than text generation.
An encoder-only model is a transformer architecture that reads and processes text in both directions simultaneously, producing dense numerical representations (embeddings) that capture the meaning of the input. BERT is the most well-known example.
How encoder-only models work
Unlike decoder-only models that read left to right, encoder-only models use bidirectional attention β each token can attend to every other token in the input, both before and after it. This means the representation of any word is informed by the full surrounding context, making these models excellent at understanding text.
The BERT revolution
BERT (Bidirectional Encoder Representations from Transformers), released by Google in 2018, demonstrated that pre-training a bidirectional encoder on masked language modelling β hiding random words and predicting them from context β produced representations useful for almost any language understanding task. Fine-tuned BERT models set new records on question answering, sentiment analysis, and text classification.
What encoder-only models are good at
- Text classification: Determining sentiment, topic, intent, or category of text.
- Named entity recognition: Identifying and classifying names, dates, organisations, and other entities in text.
- Semantic similarity: Determining how similar two pieces of text are in meaning.
- Search and retrieval: Creating embeddings that power semantic search systems.
- Token classification: Tasks like part-of-speech tagging or information extraction.
What they cannot do
Encoder-only models are not designed for text generation. They produce representations of input text but do not generate new text token by token. This is why chatbots and writing assistants use decoder-only models (GPT, Claude) rather than encoder-only models (BERT).
Encoder-only models today
While decoder-only LLMs dominate headlines, encoder-only models remain widely deployed in production. They power search engines, recommendation systems, spam filters, content moderation, and countless classification tasks. They are smaller, faster, and cheaper to run than large generative models, making them ideal when you need understanding rather than generation.
Why This Matters
Encoder-only models power the search, classification, and embedding systems that run behind the scenes in most AI applications. Understanding the distinction between encoder and decoder architectures helps you choose the right tool for specific tasks.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Understanding Model Architectures