Skip to main content
Early access — new tools and guides added regularly
Core AI

Test-Time Compute

Last reviewed: April 2026

Additional computation spent during inference — such as generating and evaluating multiple responses — to improve the quality of an AI model's output.

Test-time compute refers to additional computational resources spent during inference (when the model is generating a response) to improve output quality. Instead of generating a single answer and returning it, the model spends more time thinking, exploring alternatives, and verifying its reasoning.

The core idea

Traditionally, AI model quality was determined entirely during training. Once trained, the model's capabilities were fixed — it spent the same compute on every query regardless of difficulty. Test-time compute breaks this assumption by allowing the model to spend more effort on harder problems.

How test-time compute works

Several techniques fall under the test-time compute umbrella:

  • Best-of-N sampling: Generate N different responses and select the best one using a verifier or scoring function. More responses (more compute) means a better chance of finding an excellent answer.
  • Chain-of-thought reasoning: The model generates intermediate reasoning steps before producing a final answer. More reasoning tokens mean more compute and often better results.
  • Tree search: The model explores multiple reasoning paths (like a chess engine exploring different move sequences) and selects the most promising one.
  • Self-verification: The model generates an answer, then checks its own work, potentially revising multiple times.
  • Majority voting: Generate multiple answers and take the most common one. Simple but effective for factual questions.

Why test-time compute matters now

OpenAI's o1 and o3 models demonstrated that significant performance gains are possible by spending more compute at inference time. These models "think" for longer on difficult problems, producing reasoning traces that can be dozens of times longer than the final answer. The result is substantially better performance on maths, coding, and reasoning benchmarks.

The scaling implications

Test-time compute creates a new scaling axis. Previously, the only way to improve AI was to train a bigger model (which is expensive and takes months). Test-time compute lets you improve performance on-demand by spending more at inference time. You can even adapt the compute budget to the difficulty: easy questions get quick answers, hard questions get extended reasoning.

Trade-offs

  • Cost: More compute per query means higher per-request costs.
  • Latency: Thinking longer means waiting longer for a response.
  • Diminishing returns: Beyond a certain point, additional compute yields minimal improvement.
  • Not universally beneficial: Some tasks (simple classification, extraction) do not benefit from additional reasoning.
Want to go deeper?
This topic is covered in our Advanced level. Access all 60+ lessons free.

Why This Matters

Test-time compute represents a fundamental shift in how AI performance is scaled. Understanding it helps you evaluate new "reasoning" models, anticipate the cost and latency trade-offs of advanced AI features, and decide when paying for additional compute per query is justified by better results.

Related Terms

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How Models Are Trained and Aligned