Context Engineering
Context engineering is the practice of deliberately designing what information, in what order, in what format an LLM sees when it generates a response. It subsumes prompt engineering — which polishes a single prompt — and extends to everything that enters the context window: system prompts, retrieved documents, conversation history, user metadata, tool schemas, and more. Simon Willison, Tobi Lütke, and Andrej Karpathy started using the term publicly in 2025, and by 2026 it's become standard vocabulary in LLM product engineering.
Context engineering is the practice of deliberately designing what information, in what order, in what format an LLM sees when it generates a response. It subsumes prompt engineering — which polishes a single prompt — and extends to everything that enters the context window: system prompts, retrieved documents, conversation history, user metadata, tool schemas, and more. Simon Willison, Tobi Lütke, and Andrej Karpathy started using the term publicly in 2025, and by 2026 it's become standard vocabulary in LLM product engineering.
Why It Matters
Most LLM product failures in production come from "we gave the model the wrong context," not "the model is bad." Even with 1M-token context windows, dumping information in randomly hurts performance — the well-documented "Lost in the Middle" effect. Context engineering treats the composite input (RAG, memory, tools, history) as a design variable, and the same model can perform 2–10x better with better context construction.
What Makes Up Context
System prompt: Fixed instructions — role, constraints, tone, goals.
User prompt: The user's input for this turn.
Conversation history: Prior turns.
RAG results: Relevant documents and chunks from a vector DB.
Tool definitions: Names, descriptions, and schemas of callable functions.
Tool call results: Data returned from earlier tool invocations.
User metadata: Language, timezone, subscription plan, behavior history.
Constitution / guardrails: Safety rules, forbidden topics, output filters.
All of these merge into a single context window that goes to the LLM.
Context Engineering vs Prompt Engineering
| Aspect | Prompt Engineering | Context Engineering |
|---|---|---|
| Unit | A single prompt sentence | The entire context window |
| Concern | "How do I ask?" | "What should I show?" |
| Level | Tactical (sentence-level) | Systemic (pipeline-level) |
| Example | Add "think step by step" | Decide RAG chunk count, order, summarization |
Prompt engineering is the craft of writing good sentences; context engineering is the craft of designing the entire input structure those sentences live in.
Core Principles
Include only what's needed: Longer context means more "lost in the middle" and higher cost. Ruthlessly cut irrelevant info.
Order deliberately: LLMs weight the start and end more. Put the most important instructions and data at the edges.
Structured tagging: Wrap external documents in <source>…</source>, examples in <example>…</example>, so the model knows the role of each part.
Dynamic selection: Different request types deserve different tool lists, RAG results, and system prompts. One-size-fits-all wastes tokens.
Summarize and compress: Summarize long histories to save tokens. Features like Claude artifacts are a canonical example.
Manage agent loops: For multi-step reasoning, clean and reconstruct context between steps.
Practical Challenges
Token budget: Context windows aren't free. Filling 1M tokens explodes cost and latency.
Relevance ranking: Decide how many RAG chunks to pull and how much to rerank.
Memory strategy: Long-term memory in a vector DB, short-term memory via summarization.
Debugging: When output quality drops, find which part of the context is at fault. Logging and reproducibility are essential.
GEO Implications
AI search engines are themselves context engineering pipelines. Content structured to "fit the context well" gets cited more. Specifically: ① each section should be independently summarizable, ② the first sentence should carry the core answer, ③ metadata and sources should be explicit. That's "context-engineering-friendly writing" for bloggers.
Sources: