What Is Context Engineering? | GEO Glossary

Context engineering is the practice of deliberately designing what information, in what order, in what format an LLM sees when it generates a response. It subsumes prompt engineering — which polishes a single prompt — and extends to everything that enters the context window: system prompts, retrieved documents, conversation history, user metadata, tool schemas, and more. Simon Willison, Tobi Lütke, and Andrej Karpathy started using the term publicly in 2025, and by 2026 it's become standard vocabulary in LLM product engineering.

Why It Matters

Most LLM product failures in production come from "we gave the model the wrong context," not "the model is bad." Even with 1M-token context windows, dumping information in randomly hurts performance — the well-documented "Lost in the Middle" effect. Context engineering treats the composite input (RAG, memory, tools, history) as a design variable, and the same model can perform 2–10x better with better context construction.

What Makes Up Context

System prompt: Fixed instructions — role, constraints, tone, goals.

User prompt: The user's input for this turn.

Conversation history: Prior turns.

RAG results: Relevant documents and chunks from a vector DB.

Tool definitions: Names, descriptions, and schemas of callable functions.

Tool call results: Data returned from earlier tool invocations.

User metadata: Language, timezone, subscription plan, behavior history.

Constitution / guardrails: Safety rules, forbidden topics, output filters.

All of these merge into a single context window that goes to the LLM.

Context Engineering vs Prompt Engineering

Aspect	Prompt Engineering	Context Engineering
Unit	A single prompt sentence	The entire context window
Concern	"How do I ask?"	"What should I show?"
Level	Tactical (sentence-level)	Systemic (pipeline-level)
Example	Add "think step by step"	Decide RAG chunk count, order, summarization

Prompt engineering is the craft of writing good sentences; context engineering is the craft of designing the entire input structure those sentences live in.

Core Principles

Include only what's needed: Longer context means more "lost in the middle" and higher cost. Ruthlessly cut irrelevant info.

Order deliberately: LLMs weight the start and end more. Put the most important instructions and data at the edges.

Structured tagging: Wrap external documents in <source>…</source>, examples in <example>…</example>, so the model knows the role of each part.

Dynamic selection: Different request types deserve different tool lists, RAG results, and system prompts. One-size-fits-all wastes tokens.

Summarize and compress: Summarize long histories to save tokens. Features like Claude artifacts are a canonical example.

Manage agent loops: For multi-step reasoning, clean and reconstruct context between steps.

Practical Challenges

Token budget: Context windows aren't free. Filling 1M tokens explodes cost and latency.

Relevance ranking: Decide how many RAG chunks to pull and how much to rerank.

Memory strategy: Long-term memory in a vector DB, short-term memory via summarization.

Debugging: When output quality drops, find which part of the context is at fault. Logging and reproducibility are essential.

GEO Implications

AI search engines are themselves context engineering pipelines. Content structured to "fit the context well" gets cited more. Specifically: ① each section should be independently summarizable, ② the first sentence should carry the core answer, ③ metadata and sources should be explicit. That's "context-engineering-friendly writing" for bloggers.

Sources: