Encoder-Decoder
A neural network architecture where an encoder compresses input into a representation and a decoder generates output from that representation, used in translation, summarisation, and generation tasks.
The encoder-decoder architecture is a neural network design pattern where one component (the encoder) processes the input and compresses it into a representation, and another component (the decoder) uses that representation to generate the output.
How it works
Think of it as a two-step process:
- Encoder β reads the input (a sentence, an image, an audio clip) and produces a fixed-size internal representation that captures its meaning
- Decoder β takes that representation and generates the output one piece at a time (a translated sentence, a caption, a response)
Where encoder-decoder is used
- Machine translation β the encoder processes the source language, the decoder generates the target language (this was the original application)
- Text summarisation β the encoder processes a long document, the decoder produces a concise summary
- Image captioning β an image encoder creates a representation, a text decoder describes what it sees
- Speech recognition β an audio encoder processes sound, a text decoder produces transcription
Encoder-decoder in transformers
The original transformer architecture from "Attention Is All You Need" used a full encoder-decoder design. Since then, the field has diverged:
- Encoder-only models (like BERT) β excel at understanding tasks: classification, entity extraction, sentiment analysis. They process input but do not generate.
- Decoder-only models (like GPT, Claude) β generate text autoregressively, one token at a time. They are the dominant architecture for chatbots and text generation.
- Full encoder-decoder models (like T5, BART) β maintain the original design. Still used for translation and summarisation where having a separate comprehension step helps.
Why this matters for choosing models
The architecture determines what a model is good at. Encoder-only models are best for classification. Decoder-only models are best for generation. Full encoder-decoder models are best for sequence-to-sequence tasks like translation. Understanding this helps you pick the right model type for your use case.
Why This Matters
Knowing the encoder-decoder distinction helps you understand why different AI models excel at different tasks. When evaluating AI solutions, this knowledge helps you match the right architecture to your need β classification, generation, or transformation β rather than assuming one model fits all.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: How LLMs Actually Work