Practical

Context Window

Last reviewed: April 2026

The maximum amount of text an AI can process at once. Think of it as the AI's working memory — everything it can see and consider when generating a response.

The context window is the maximum amount of text an AI model can process in a single interaction. It includes everything: your prompt, any documents or conversation history you provide, and the AI's response. Think of it as the AI's working memory — everything that fits inside the window is visible to the model; everything outside it effectively does not exist.

How context windows are measured

Context windows are measured in tokens, not words. A token is roughly 3-4 characters or about three-quarters of a word. When a frontier model has a "1,000,000 token context window," that means it can process approximately 750,000 words at once — roughly the length of ten full novels.

Context window sizes (as of 2026)

Context windows have grown dramatically:

GPT-3 (2020): 4,096 tokens (~3,000 words)
GPT-4 (2023): 128,000 tokens (~96,000 words)
Claude (2024-2026): Up to 1,000,000 tokens (~750,000 words)
Gemini (2024-2026): Up to 2,000,000 tokens (~1,500,000 words)

Larger context windows mean you can feed the AI entire codebases, full legal contracts, complete research papers, or lengthy conversation histories.

Why context windows matter

The context window determines what your AI interaction can reference:

Short context windows force you to be selective about what information you include. You must summarise, excerpt, or split tasks across multiple conversations.
Long context windows let you include entire documents, full conversation histories, and rich background context. The AI can cross-reference information across the entire input.

Practical implications

Understanding context windows changes how you use AI:

Document analysis: A 1M token window lets you upload entire books and codebases and ask questions about any part of them. A 4K token window requires you to copy-paste specific sections.
Conversation continuity: Longer windows mean the AI remembers more of your conversation. In short-window models, early messages fall off as the conversation grows.
Code review: With large context windows, you can provide an entire codebase for review. With small windows, you review one file at a time.

Context window vs knowledge

A critical distinction: the context window is the AI's working memory, not its knowledge. Information in the training data is embedded in the model's weights — the AI "knows" it permanently. Information in the context window is temporary — the AI can reference it during this conversation but will not remember it in the next one.

This means: - You need to re-provide context in each new conversation - The context window is where RAG documents, system prompts, and conversation history live - Exceeding the context window means information is silently dropped, which can cause unexpected behaviour

Cost implications

Longer inputs cost more to process — you pay per token for both input and output. A prompt that includes a 50-page document costs significantly more than a short question. This creates a practical trade-off: include enough context for a good response, but avoid padding your prompts with unnecessary information.

The attention challenge

Even within the context window, AI models do not pay equal attention to all content. Research shows that information at the beginning and end of the context tends to be weighted more heavily than information in the middle — a phenomenon called "lost in the middle." Structuring your input with the most important information first and last can improve response quality.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Context window size directly affects what is possible with AI in your workflows. A short context window means you cannot ask AI to review a full contract, analyse a complete dataset, or maintain context across a long conversation. A large context window enables document analysis, codebase review, and complex multi-step tasks. Understanding this constraint helps you choose the right model, structure your prompts effectively, and set realistic expectations for AI-assisted tasks.

Related Terms

Token

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Inference

The process of an AI model generating output from your input. Every time you send a prompt and get a response, that is inference.

Prompt Engineering

The skill of writing instructions to AI that consistently produce useful, accurate, high-quality output.

Related Comparisons

ChatGPT vs Claude

An honest, detailed comparison of ChatGPT and Claude across writing quality, accuracy, context window, coding, pricing, and more. Updated for 2026.

ChatGPT vs Gemini

A detailed comparison of ChatGPT and Google Gemini across writing, research, Google Workspace integration, pricing, and real-world performance.

Cursor vs GitHub Copilot

Cursor and GitHub Copilot are the two leading AI coding assistants. This comparison covers features, model support, pricing, and which is better for different developers.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Context Windows: Why AI Forgets and How to Fix It

← Back to Glossary