Last updated: April 5, 2026 · Core Concepts · by Daniel Ashford

What is Context Window?

QUICK ANSWER

The maximum amount of text an LLM can process in a single request.

Definition

The context window is the maximum number of tokens an LLM can process in a single interaction. This includes both the input (your prompt, system instructions, and any documents) and the output (the model response). Anything beyond the context window is invisible to the model.

How It Works

Context windows have grown dramatically: GPT-3 had 4K tokens, GPT-4 introduced 128K, and Gemini 2.5 Ultra now supports 2 million tokens — enough to process an entire book. Larger context windows enable long document analysis, multi-turn conversations with full history, and code repository understanding. However, larger contexts cost more and may reduce response quality on information far from the end.

Example

With a 200K context window (Claude Opus 4), you can paste an entire 300-page document and ask questions about it. With a 4K context window, you could only fit about 6 pages.

Related Terms

Tokens

The basic units of text that LLMs process — roughly 3/4 of a word.

RAG (Retrieval-Augmented Generation)

A technique that gives LLMs access to external documents to improve accuracy and reduce hallucination.

Hallucination

When an LLM generates plausible-sounding but factually incorrect information.

See How Models Compare

Understanding context window is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology

← Browse all 47 glossary terms

Daniel Ashford

Founder & Lead Evaluator · 200+ models evaluated