Core AI

Beam Search

Last reviewed: April 2026

A text generation strategy that explores multiple possible continuations simultaneously and selects the sequence with the highest overall probability.

Beam search is a decoding algorithm used in AI text generation that keeps track of multiple candidate sequences at each step and ultimately selects the one with the highest cumulative probability.

The generation problem

When a language model generates text, it predicts one token at a time. At each step, there are thousands of possible next tokens, each with a different probability. The challenge is choosing a path through these possibilities that produces coherent, high-quality text.

How beam search works

Beam search maintains a fixed number of candidate sequences — called the "beam width." If the beam width is 3, the algorithm keeps the top 3 most probable sequences at each step.

At step one, it picks the 3 most likely first tokens. At step two, it expands each of those 3 sequences with all possible next tokens, scores the resulting combinations, and keeps only the top 3 overall. This continues until the sequences are complete.

By considering multiple paths simultaneously, beam search avoids committing to a locally good choice that leads to a poor overall sequence.

Beam search vs other strategies

Greedy decoding: Always picks the single most probable next token. Fast but often produces repetitive or suboptimal text.
Beam search: Explores multiple paths. Better quality but more computationally expensive.
Sampling with temperature: Randomly selects from probable tokens, introducing variety. Better for creative text.
Top-k and top-p sampling: Constrained randomness that balances quality and diversity.

When beam search is used

Beam search excels in tasks where there is a clearly "correct" output — machine translation, speech recognition, and structured data generation. For open-ended creative writing or conversation, sampling-based methods are generally preferred because beam search tends to produce safe, repetitive text.

Practical impact

Most users never configure beam search directly. But understanding it explains why AI-generated text sometimes feels "safe" or predictable — deterministic decoding strategies like beam search optimise for the most probable output rather than the most interesting one.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Understanding beam search helps you grasp why AI models sometimes produce bland or repetitive text and why adjusting generation settings like temperature can dramatically change output quality. It is foundational knowledge for anyone tuning AI outputs for specific use cases.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Token

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

Generative AI

AI that creates new content — text, images, code, audio, video — rather than just analysing or classifying existing data.

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How Language Models Generate Text

← Back to Glossary