Core AI

Encoder-Decoder

Last reviewed: April 2026

A neural network architecture where an encoder compresses input into a representation and a decoder generates output from that representation, used in translation, summarisation, and generation tasks.

The encoder-decoder architecture is a neural network design pattern where one component (the encoder) processes the input and compresses it into a representation, and another component (the decoder) uses that representation to generate the output.

How it works

Think of it as a two-step process:

Encoder — reads the input (a sentence, an image, an audio clip) and produces a fixed-size internal representation that captures its meaning
Decoder — takes that representation and generates the output one piece at a time (a translated sentence, a caption, a response)

Where encoder-decoder is used

Machine translation — the encoder processes the source language, the decoder generates the target language (this was the original application)
Text summarisation — the encoder processes a long document, the decoder produces a concise summary
Image captioning — an image encoder creates a representation, a text decoder describes what it sees
Speech recognition — an audio encoder processes sound, a text decoder produces transcription

Encoder-decoder in transformers

The original transformer architecture from "Attention Is All You Need" used a full encoder-decoder design. Since then, the field has diverged:

Encoder-only models (like BERT) — excel at understanding tasks: classification, entity extraction, sentiment analysis. They process input but do not generate.
Decoder-only models (like GPT, Claude) — generate text autoregressively, one token at a time. They are the dominant architecture for chatbots and text generation.
Full encoder-decoder models (like T5, BART) — maintain the original design. Still used for translation and summarisation where having a separate comprehension step helps.

Why this matters for choosing models

The architecture determines what a model is good at. Encoder-only models are best for classification. Decoder-only models are best for generation. Full encoder-decoder models are best for sequence-to-sequence tasks like translation. Understanding this helps you pick the right model type for your use case.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Knowing the encoder-decoder distinction helps you understand why different AI models excel at different tasks. When evaluating AI solutions, this knowledge helps you match the right architecture to your need — classification, generation, or transformation — rather than assuming one model fits all.

Related Terms

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Neural Network

A computing system loosely inspired by the human brain, made of layers of interconnected nodes that learn to recognise patterns in data.

Token

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How LLMs Actually Work

← Back to Glossary