Sampling
The process by which an AI model selects its next output token from a probability distribution, with settings like temperature controlling how random the selection is.
Sampling is the process an AI model uses to choose what to say next. When a large language model generates text, it does not simply pick the single most likely next word. Instead, it calculates a probability distribution over all possible next tokens and then samples from that distribution β introducing controlled randomness into its output.
How sampling works
At each step of text generation, the model produces a probability for every token in its vocabulary (typically 30,000-100,000 tokens). "The cat sat on the" might assign high probabilities to "mat" (15%), "floor" (12%), "sofa" (8%), and so on. Sampling is the process of choosing which token to use.
Key sampling parameters
- Temperature: Controls how random the selection is. Temperature 0 always picks the highest-probability token (deterministic). Temperature 1 samples according to the true probabilities. Higher temperatures make unlikely tokens more probable, producing more creative but less predictable output.
- Top-k sampling: Only considers the top k most likely tokens, ignoring everything else. Top-k of 50 means the model chooses from the 50 most probable next tokens.
- Top-p (nucleus) sampling: Only considers the smallest set of tokens whose combined probability exceeds a threshold p. If p is 0.9, the model considers just enough tokens to cover 90 percent of the probability mass.
Why sampling matters
Without sampling, AI models would be completely deterministic β the same prompt would always produce the same response. Sampling introduces variety, which is essential for creative tasks, brainstorming, and generating multiple alternative responses.
Practical implications
Different tasks call for different sampling strategies:
- Factual answers, code, data extraction: Low temperature (0 to 0.3) for precision and consistency
- General conversation and writing: Medium temperature (0.5 to 0.7) for a balance of coherence and variety
- Creative writing, brainstorming: Higher temperature (0.7 to 1.0) for diversity and unexpected connections
Greedy decoding vs sampling
When temperature is set to 0, the model uses greedy decoding β always choosing the single most probable token. This is fast and deterministic but can produce repetitive, bland text. Sampling with moderate temperature typically produces higher-quality writing.
Understanding sampling helps you tune AI outputs for your specific needs. If responses feel too generic, increase temperature. If they feel too chaotic, decrease it.
Why This Matters
Sampling parameters are one of the most accessible ways to control AI output quality. Understanding how temperature, top-k, and top-p work lets you tune AI tools for specific tasks β making responses more creative when you need ideas and more precise when you need accuracy.
Related Terms
Continue learning in Essentials
This topic is covered in our lesson: Controlling AI Output Quality