What Is a Context Window? | GEO Glossary

A context window is the maximum number of input and output tokens an LLM can process in a single request. It holds the user prompt, system prompt, prior conversation, RAG-retrieved documents, and the generated response — all at once.

Why It Matters

The context window is the LLM's "short-term memory." It determines how many web pages an AI search engine can consider when answering a query and how long a document it can summarize. In 2023 the norm was 4K–8K tokens; in 2026, 1M+ tokens is standard — fundamentally changing the breadth and depth of sources LLMs draw from. For GEO, this means AI search now compares many competing pages at once and decides which one to cite, making document structure and section quality the deciding factor.

Context Windows by Model (2026)

Model	Context Window
Claude Opus 4.8	1M tokens
Gemini 3.5	1M–2M tokens
GPT-5.5	1M tokens
Llama 4	128K–1M tokens

1M tokens is roughly 750K English words — about a 400–500 page book.

Tokens, Not Words

Context windows are measured in tokens, not words. English averages ~1.3 tokens per word, but languages like Korean or Japanese use ~1.5–2 tokens per character, meaning non-English content consumes significantly more of the budget for the same page length.

GEO Implications

Whole documents get processed: LLMs used to see only top snippets; now they read entire pages and pick the best section to cite. Document-wide structural clarity matters.

Direct competitor comparison: Large context windows let models compare many competing pages for the same query at once. Winning isn't about being "good" — it's about being structurally easier to cite than the alternatives.

Front-loading matters more: LLMs weight earlier tokens more heavily. Put the core definition and answer at the very top of the document.

"Lost in the middle": Even large-context models degrade on information buried mid-document. Critical content belongs near the start or end, not the middle.

Sources: