Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Encoder-Decoder

Last reviewed: April 2026

A neural network architecture where an encoder compresses input into a representation and a decoder generates output from that representation, used in translation, summarisation, and generation tasks.

The encoder-decoder architecture is a neural network design pattern where one component (the encoder) processes the input and compresses it into a representation, and another component (the decoder) uses that representation to generate the output.

How it works

Think of it as a two-step process:

  1. Encoder β€” reads the input (a sentence, an image, an audio clip) and produces a fixed-size internal representation that captures its meaning
  2. Decoder β€” takes that representation and generates the output one piece at a time (a translated sentence, a caption, a response)

Where encoder-decoder is used

  • Machine translation β€” the encoder processes the source language, the decoder generates the target language (this was the original application)
  • Text summarisation β€” the encoder processes a long document, the decoder produces a concise summary
  • Image captioning β€” an image encoder creates a representation, a text decoder describes what it sees
  • Speech recognition β€” an audio encoder processes sound, a text decoder produces transcription

Encoder-decoder in transformers

The original transformer architecture from "Attention Is All You Need" used a full encoder-decoder design. Since then, the field has diverged:

  • Encoder-only models (like BERT) β€” excel at understanding tasks: classification, entity extraction, sentiment analysis. They process input but do not generate.
  • Decoder-only models (like GPT, Claude) β€” generate text autoregressively, one token at a time. They are the dominant architecture for chatbots and text generation.
  • Full encoder-decoder models (like T5, BART) β€” maintain the original design. Still used for translation and summarisation where having a separate comprehension step helps.

Why this matters for choosing models

The architecture determines what a model is good at. Encoder-only models are best for classification. Decoder-only models are best for generation. Full encoder-decoder models are best for sequence-to-sequence tasks like translation. Understanding this helps you pick the right model type for your use case.

Want to go deeper?
This topic is covered in our Advanced level. Access all 60+ lessons free.

Why This Matters

Knowing the encoder-decoder distinction helps you understand why different AI models excel at different tasks. When evaluating AI solutions, this knowledge helps you match the right architecture to your need β€” classification, generation, or transformation β€” rather than assuming one model fits all.

Related Terms

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How LLMs Actually Work