Sequence-to-Sequence (Seq2Seq)
A model architecture that converts one sequence of data into another, originally designed for machine translation and now underlying many AI text tasks.
Sequence-to-sequence is a model architecture designed to transform one sequence into another. It takes an input sequence (like a sentence in English) and produces an output sequence (like the same sentence in French). This architecture was a breakthrough for machine translation and laid the groundwork for modern AI assistants.
How it works
A seq2seq model has two main components:
- Encoder: Reads the input sequence and compresses it into a fixed-length representation called a context vector. This vector captures the meaning of the entire input.
- Decoder: Takes the context vector and generates the output sequence one token at a time. At each step, it considers the context vector and all previously generated tokens.
The encoder processes the input, the decoder produces the output. This simple division of labour handles remarkably complex transformations.
Applications
Seq2seq models power many AI capabilities:
- Machine translation: Converting text from one language to another
- Text summarisation: Converting a long document into a short summary
- Question answering: Converting a question into an answer
- Chatbots: Converting a user message into a response
- Code generation: Converting a natural language description into code
- Speech recognition: Converting audio sequences into text
Evolution of seq2seq
The original seq2seq models (2014) used recurrent neural networks, which processed input one token at a time. This worked but was slow and struggled with long sequences because the fixed-length context vector became a bottleneck.
The attention mechanism (2015) solved this by allowing the decoder to look back at all encoder positions, not just the compressed context vector. The decoder could focus on the most relevant parts of the input at each generation step.
The transformer architecture (2017) took this further by removing the sequential processing entirely, using self-attention to process all positions simultaneously. Modern LLMs are essentially advanced seq2seq systems built entirely on transformer architecture.
Why the concept still matters
Even though the specific architectures have evolved, the seq2seq framework remains the mental model for understanding how AI transforms inputs into outputs. When you give Claude a prompt and receive a response, you are using a seq2seq system β just a vastly more powerful one than the original 2014 design.
Why This Matters
The seq2seq framework is the conceptual foundation for understanding how AI generates text, translates languages, and answers questions. Understanding this architecture helps you grasp why AI has certain capabilities and limitations, and how modern transformer-based models evolved from earlier designs.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: Neural Network Architectures Explained