Practical

Temperature Scaling

Last reviewed: April 2026

A parameter that controls the randomness of AI model outputs — lower temperatures produce more predictable responses while higher temperatures increase creativity and variety.

Temperature is a parameter that controls how random or deterministic an AI model's outputs are. It is one of the most important and commonly adjusted settings when using language models.

How temperature works

When a language model predicts the next token, it assigns a probability to every possible token in its vocabulary. Temperature scales these probabilities before a token is selected.

Low temperature (0.0-0.3): Sharpens the probability distribution. The most likely tokens become even more likely, and unlikely tokens become nearly impossible. The model becomes more predictable and deterministic.
Medium temperature (0.4-0.7): A balanced distribution that allows some variety while staying coherent.
High temperature (0.8-1.5+): Flattens the probability distribution. Less likely tokens get a greater chance of being selected, introducing more randomness, creativity, and unpredictability.

At temperature 0, the model always selects the most probable token — the output is completely deterministic. As temperature increases, the model is increasingly willing to take less probable paths.

Choosing the right temperature

Low temperature for: factual answers, data extraction, code generation, technical writing, structured output, any task where accuracy matters more than creativity.
Medium temperature for: general conversation, business writing, summarization, balanced tasks requiring both accuracy and natural language.
High temperature for: creative writing, brainstorming, generating diverse options, fiction, poetry, any task where variety and novelty are desired.

Temperature in practice

Most AI APIs expose temperature as a configurable parameter. Default values are typically around 0.7-1.0. For production applications where consistency matters, lower temperatures are generally preferred. For creative tools and idea generation, higher temperatures produce more interesting results.

Temperature vs other sampling parameters

Temperature works alongside other parameters like top-p (nucleus sampling) and top-k. Top-p limits selection to the smallest set of tokens whose cumulative probability exceeds a threshold. Top-k limits selection to the k most probable tokens. These parameters can be combined with temperature for fine-grained control over output behaviour.

A common mistake

Setting temperature to 0 does not guarantee identical outputs for the same prompt across all API calls due to floating-point arithmetic and hardware variations. Near-deterministic behaviour requires temperature 0 plus careful control of other randomness sources.

Want to go deeper?

This topic is covered in our Essentials level. Access all 100+ lessons free.

Why This Matters

Temperature is the single most impactful setting for controlling AI output quality. Understanding how to adjust it for different tasks lets you get more accurate results for analytical work and more creative results for brainstorming — using the same model.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Token

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

Generative AI

AI that creates new content — text, images, code, audio, video — rather than just analysing or classifying existing data.

Prompt Engineering

The skill of writing instructions to AI that consistently produce useful, accurate, high-quality output.

Learn More

Continue learning in Essentials

This topic is covered in our lesson: Configuring AI for Better Results

← Back to Glossary