Core AI

Sampling

Last reviewed: April 2026

The process by which an AI model selects its next output token from a probability distribution, with settings like temperature controlling how random the selection is.

Sampling is the process an AI model uses to choose what to say next. When a large language model generates text, it does not simply pick the single most likely next word. Instead, it calculates a probability distribution over all possible next tokens and then samples from that distribution — introducing controlled randomness into its output.

How sampling works

At each step of text generation, the model produces a probability for every token in its vocabulary (typically 30,000-100,000 tokens). "The cat sat on the" might assign high probabilities to "mat" (15%), "floor" (12%), "sofa" (8%), and so on. Sampling is the process of choosing which token to use.

Key sampling parameters

Temperature: Controls how random the selection is. Temperature 0 always picks the highest-probability token (deterministic). Temperature 1 samples according to the true probabilities. Higher temperatures make unlikely tokens more probable, producing more creative but less predictable output.
Top-k sampling: Only considers the top k most likely tokens, ignoring everything else. Top-k of 50 means the model chooses from the 50 most probable next tokens.
Top-p (nucleus) sampling: Only considers the smallest set of tokens whose combined probability exceeds a threshold p. If p is 0.9, the model considers just enough tokens to cover 90 percent of the probability mass.

Why sampling matters

Without sampling, AI models would be completely deterministic — the same prompt would always produce the same response. Sampling introduces variety, which is essential for creative tasks, brainstorming, and generating multiple alternative responses.

Practical implications

Different tasks call for different sampling strategies:

Factual answers, code, data extraction: Low temperature (0 to 0.3) for precision and consistency
General conversation and writing: Medium temperature (0.5 to 0.7) for a balance of coherence and variety
Creative writing, brainstorming: Higher temperature (0.7 to 1.0) for diversity and unexpected connections

Greedy decoding vs sampling

When temperature is set to 0, the model uses greedy decoding — always choosing the single most probable token. This is fast and deterministic but can produce repetitive, bland text. Sampling with moderate temperature typically produces higher-quality writing.

Understanding sampling helps you tune AI outputs for your specific needs. If responses feel too generic, increase temperature. If they feel too chaotic, decrease it.

Want to go deeper?

This topic is covered in our Essentials level. Access all 100+ lessons free.

Why This Matters

Sampling parameters are one of the most accessible ways to control AI output quality. Understanding how temperature, top-k, and top-p work lets you tune AI tools for specific tasks — making responses more creative when you need ideas and more precise when you need accuracy.

Related Terms

Temperature

A setting that controls how creative or conservative AI output is. Low temperature = predictable and focused. High temperature = varied and creative.

Token

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

Tokenizer (Tokeniser)

The component that converts text into tokens — the numerical units an AI model processes. Different models use different tokenisers, which affects how they handle text.

Learn More

Continue learning in Essentials

This topic is covered in our lesson: Controlling AI Output Quality

← Back to Glossary