Practical

Token

Last reviewed: April 2026

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

A token is the smallest unit of text that an AI model processes. It is not exactly a word or a character — it is a chunk of text that the model's tokeniser has learned is a useful unit. Most English words are one token. Longer or less common words are split into multiple tokens. Punctuation and spaces are also tokens.

Approximate conversions

1 token ≈ 4 characters in English
1 token ≈ 0.75 words
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words (roughly one page of text)
100,000 tokens ≈ 75,000 words (roughly a novel)

These are approximations. The exact token count varies by model and language.

Why tokens matter

Tokens matter for three practical reasons:

1. Context window limits Every AI model has a maximum number of tokens it can process at once — its context window. If your prompt plus the AI's response exceeds the context window, information gets dropped. A 1,000,000 token context window can hold roughly 750,000 words — but that includes both your input and the AI's output.

2. Pricing AI APIs charge per token for both input (your prompt) and output (the response). Typical pricing is expressed per million tokens. For example:

A simple prompt with a short response might use 500 tokens total
Analysing a 10-page document might use 5,000 input tokens plus 1,000 output tokens
Processing a full codebase might use 100,000+ tokens

Understanding token pricing helps you estimate costs and optimise your AI spending.

3. Speed More tokens = more processing time. Longer prompts take longer to process, and longer responses take longer to generate. Each output token must be generated sequentially, so a 2,000-word response takes roughly twice as long as a 1,000-word response.

How tokenisation works

AI models do not process raw text. Before your prompt reaches the model, a tokeniser breaks it into tokens using a learned vocabulary. The tokeniser for GPT-4, for example, has about 100,000 tokens in its vocabulary.

Common words are typically single tokens: - "the" → 1 token - "hello" → 1 token - "computer" → 1 token

Less common or longer words are split: - "tokenisation" → 3 tokens ("token", "isation") - "pneumonoultramicroscopicsilicovolcanoconiosis" → many tokens

Numbers, code, and non-English text tend to use more tokens per concept: - "2024" → 1-2 tokens - A line of Python code → 10-30 tokens depending on complexity - Japanese or Chinese text uses more tokens per word than English

Optimising token usage

For cost and performance optimisation:

Write concise prompts — do not include unnecessary context
Use system prompts for persistent instructions rather than repeating them in every message
Choose output length appropriate to the task — do not request 2,000 words when 200 will do
Use cheaper models for simple tasks — a quick classification does not need the most expensive model
Cache results for repeated queries

Input vs output token pricing

Most providers charge different rates for input tokens and output tokens, with output tokens typically costing 3-5x more than input tokens. This is because generating output requires more computation than processing input.

Want to go deeper?

This topic is covered in our Foundations level. Access all 100+ lessons free.

Why This Matters

Tokens are the currency of AI. Every interaction costs tokens, and every context window is measured in them. Understanding tokens helps you estimate AI costs, optimise prompt length, stay within context limits, and make informed decisions about which model to use for which task. When budgeting for AI tools across your organisation, token understanding is the difference between controlled spending and runaway costs.

Related Terms

Context Window

The maximum amount of text an AI can process at once. Think of it as the AI's working memory — everything it can see and consider when generating a response.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

API (Application Programming Interface)

A way for software to communicate with other software. APIs are how developers connect AI capabilities to websites, apps, and business tools.

Prompt Engineering

The skill of writing instructions to AI that consistently produce useful, accurate, high-quality output.

Related Comparisons

ChatGPT vs Claude

An honest, detailed comparison of ChatGPT and Claude across writing quality, accuracy, context window, coding, pricing, and more. Updated for 2026.

Claude vs ChatGPT vs Gemini

Claude, ChatGPT, and Gemini compared across writing quality, context window, coding, real-time information, multimodal capabilities, ecosystem, accuracy, and pricing.

Ollama vs LM Studio

A detailed comparison of Ollama and LM Studio for running large language models locally. Covers ease of use, model support, performance, and developer experience.

Learn More

Continue learning in Foundations

This topic is covered in our lesson: How Large Language Models Actually Work

← Back to Glossary