Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Top-p Sampling (Nucleus Sampling)

Last reviewed: April 2026

A method for controlling AI output randomness by only considering the smallest set of tokens whose combined probability exceeds a threshold p.

Top-p sampling (also called nucleus sampling) is a method for controlling the randomness of AI text generation. Instead of considering every possible next token, the model only considers the smallest set of tokens whose cumulative probability exceeds a threshold p β€” and samples from that set.

How top-p works

When generating each token, the model calculates probabilities for every word in its vocabulary. Top-p filtering:

  1. Sorts tokens by probability from highest to lowest.
  2. Adds probabilities cumulatively until the running total exceeds p.
  3. Discards all remaining tokens.
  4. Samples randomly from the kept tokens (renormalised).

For example, with top-p = 0.9:

  • If the top 3 tokens have probabilities 0.5, 0.3, and 0.15 (cumulative: 0.95), only these three are considered.
  • The remaining thousands of tokens are excluded.
  • The model samples from these three based on their relative probabilities.

Why top-p is useful

Top-p adapts dynamically to the model's confidence:

  • When the model is highly confident (one token has 95% probability), top-p restricts to just that token β€” behaving like low temperature.
  • When the model is uncertain (many tokens share probability), top-p allows more variety β€” preserving creative options.

This adaptive behaviour is why top-p often produces more natural text than temperature alone.

Top-p vs temperature

  • Temperature scales the entire probability distribution. Low temperature sharpens all peaks; high temperature flattens everything.
  • Top-p removes the tail of unlikely tokens while preserving the relative probabilities of likely ones.

In practice:

  • Use temperature when you want uniform control over randomness.
  • Use top-p when you want the model to be creative where it is uncertain but decisive where it is confident.
  • Most practitioners adjust one and leave the other at default. Adjusting both simultaneously can produce unpredictable results.

Common settings

  • Top-p = 1.0: No filtering. All tokens are considered. This is the default for most providers.
  • Top-p = 0.9: A common "quality" setting. Removes only the very unlikely tokens.
  • Top-p = 0.5: More focused. Only the most probable tokens are considered.
  • Top-p = 0.1: Very restrictive. Nearly deterministic output.

Other sampling methods

  • Top-k sampling: Considers only the k most likely tokens regardless of their probabilities.
  • Min-p sampling: A newer method that filters tokens below a minimum probability relative to the top token.
  • Greedy decoding: Always picks the most likely token (equivalent to temperature 0 or top-p approaching 0).
Want to go deeper?
This topic is covered in our Advanced level. Access all 60+ lessons free.

Why This Matters

Top-p sampling gives you fine-grained control over AI output diversity. Understanding it, alongside temperature, lets you tune AI output for different use cases β€” from deterministic data extraction to creative brainstorming β€” using just two parameters.

Related Terms

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Advanced Inference Parameters