GEO

Query Rewriting

Query rewriting is the practice of transforming a user's raw question into a form better suited for retrieval before running it against a search engine, RAG system, or AI search. It covers a range of transformations — disambiguating vague questions, resolving pronouns, expanding with synonyms, or decomposing into sub-questions.

Query rewriting is the practice of transforming a user's raw question into a form better suited for retrieval before running it against a search engine, RAG system, or AI search. It covers a range of transformations — disambiguating vague questions, resolving pronouns, expanding with synonyms, or decomposing into sub-questions.

Why It Matters

The questions users actually type aren't optimized for retrieval. They're context-dependent ("how do I do that thing"), word-dropping ("inblog how much"), or bundled multi-intent ("GEO vs SEO differences and what to do"). Running those directly against a vector DB pulls in noise. Query rewriting sharply improves retrieval accuracy and citation quality, and by 2026 it's a standard preprocessing step in production RAG pipelines.

Common Techniques

Query expansion: Add synonyms and related terms. "Blog platform recommendation" becomes "blog platform recommendation CMS WordPress Medium inblog." Raises recall in semantic search.

Query decomposition: Split a multi-intent question into sub-questions. "What's the difference between GEO and SEO and how do I respond?" becomes four queries: "What is GEO?", "What is SEO?", "GEO vs SEO differences?", "GEO response strategy?" Closely related to query fan-out.

Coreference resolution: Use prior conversation to replace pronouns with explicit nouns. "How much is that?" becomes "How much is the inblog Business plan?"

HyDE (Hypothetical Document Embeddings): The model generates a hypothetical answer to the question first, then embeds that answer for retrieval. Answers are structurally more similar to real documents than questions, boosting retrieval precision.

Query reformulation: Rewrite vague questions into clearer ones. "It's not working" becomes "Why isn't my blog post appearing in search after publishing?"

Cross-language translation: Even if the user asks in Korean, the system also runs the translated English version to surface English documents.

The Pipeline

  1. User query input: Receive the raw natural-language question
  2. LLM rewriting: A dedicated prompt analyzes the query and generates rewritten form(s)
  3. Embedding: Each rewritten query is embedded
  4. Vector search: Retrieve relevant chunks from the vector DB
  5. Reranking: Refine results with a reranker
  6. Generation: Feed top chunks into the LLM for the final answer

GEO Implications

The query a user types and the rewritten query that actually hits the vector DB are different. GEO strategy has to design content that matches the rewritten queries too.

Question-shaped headings: Using headings like "What is X?", "How to do Y?", "Difference between X and Y" directly matches decomposed sub-questions.

Synonyms and bilingual terms: Providing both proper nouns and generic terms, English alongside local names, and both expansions and abbreviations helps you catch query expansions.

Explicit answer sentences: Starting each section with a declarative "X is…" matches the hypothetical answers HyDE generates.

Comparison content: "A vs B" structured posts naturally match multiple sub-questions at once when comparative queries get decomposed.

Sources: