Skip to main content
Early access — new tools and guides added regularly
Practical

Token

Last reviewed: April 2026

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

A token is the smallest unit of text that an AI model processes. It is not exactly a word or a character — it is a chunk of text that the model's tokeniser has learned is a useful unit. Most English words are one token. Longer or less common words are split into multiple tokens. Punctuation and spaces are also tokens.

Approximate conversions

  • 1 token ≈ 4 characters in English
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words
  • 1,000 tokens ≈ 750 words (roughly one page of text)
  • 100,000 tokens ≈ 75,000 words (roughly a novel)

These are approximations. The exact token count varies by model and language.

Why tokens matter

Tokens matter for three practical reasons:

1. Context window limits Every AI model has a maximum number of tokens it can process at once — its context window. If your prompt plus the AI's response exceeds the context window, information gets dropped. A 1,000,000 token context window can hold roughly 750,000 words — but that includes both your input and the AI's output.

2. Pricing AI APIs charge per token for both input (your prompt) and output (the response). Typical pricing is expressed per million tokens. For example:

  • A simple prompt with a short response might use 500 tokens total
  • Analysing a 10-page document might use 5,000 input tokens plus 1,000 output tokens
  • Processing a full codebase might use 100,000+ tokens

Understanding token pricing helps you estimate costs and optimise your AI spending.

3. Speed More tokens = more processing time. Longer prompts take longer to process, and longer responses take longer to generate. Each output token must be generated sequentially, so a 2,000-word response takes roughly twice as long as a 1,000-word response.

How tokenisation works

AI models do not process raw text. Before your prompt reaches the model, a tokeniser breaks it into tokens using a learned vocabulary. The tokeniser for GPT-4, for example, has about 100,000 tokens in its vocabulary.

Common words are typically single tokens: - "the" → 1 token - "hello" → 1 token - "computer" → 1 token

Less common or longer words are split: - "tokenisation" → 3 tokens ("token", "isation") - "pneumonoultramicroscopicsilicovolcanoconiosis" → many tokens

Numbers, code, and non-English text tend to use more tokens per concept: - "2024" → 1-2 tokens - A line of Python code → 10-30 tokens depending on complexity - Japanese or Chinese text uses more tokens per word than English

Optimising token usage

For cost and performance optimisation:

  • Write concise prompts — do not include unnecessary context
  • Use system prompts for persistent instructions rather than repeating them in every message
  • Choose output length appropriate to the task — do not request 2,000 words when 200 will do
  • Use cheaper models for simple tasks — a quick classification does not need the most expensive model
  • Cache results for repeated queries

Input vs output token pricing

Most providers charge different rates for input tokens and output tokens, with output tokens typically costing 3-5x more than input tokens. This is because generating output requires more computation than processing input.

Want to go deeper?
This topic is covered in our Foundations level. Unlock all 52 lessons free.

Why This Matters

Tokens are the currency of AI. Every interaction costs tokens, and every context window is measured in them. Understanding tokens helps you estimate AI costs, optimise prompt length, stay within context limits, and make informed decisions about which model to use for which task. When budgeting for AI tools across your organisation, token understanding is the difference between controlled spending and runaway costs.

Related Terms

Learn More

Continue learning in Foundations

This topic is covered in our lesson: How Large Language Models Actually Work