GEO

Hybrid Search

Hybrid search is a retrieval technique that runs a dense vector search (semantic) and a sparse keyword search (BM25) in parallel, then fuses the results into a single ranked list. It captures both "meaning similarity" and "exact token match" in one query.

Hybrid search is a retrieval technique that runs a dense vector search (semantic) and a sparse keyword search (BM25) in parallel, then fuses the results into a single ranked list. It captures both "meaning similarity" and "exact token match" in one query.

Why It Matters

Dense vector search is great at semantic matches ("affordable laptops" ≈ "budget notebooks") but fails on rare tokens like product codes, SKUs, and proper nouns. Keyword search nails exact tokens but misses paraphrases. Hybrid search wins both — production RAG systems at Anthropic, OpenAI, and Elastic all report hybrid consistently outperforming either alone, typically 10–30% recall improvement on real-world retrieval benchmarks.

How It Works

1. Dual retrieval: The same query runs through both indexes — a vector index (dense embeddings) and an inverted index (BM25 or TF-IDF).

2. Score normalization: Dense and sparse scores live on different scales. They're normalized — min-max, z-score, or rank-based.

3. Fusion: Scores are combined into a single ranking. The most popular methods:

  • Reciprocal Rank Fusion (RRF): score = Σ 1/(k + rank_i) — rank-based, no tuning needed, extremely robust.
  • Weighted sum: α * dense + (1-α) * sparse — requires tuning α per domain.
  • Learned fusion: A small model predicts the optimal weight per query.

4. Optional reranking: A cross-encoder reranks the top-k fused candidates for final precision.

When to Use It

Domain-specific vocabulary: Medical codes, legal citations, part numbers.

Mixed query types: When users search both with natural language and exact strings.

Long-tail recall matters: Rare queries where BM25 still shines.

You're getting zero results from vectors alone: Often an exact-match failure — hybrid fixes it.

Trade-offs

Latency: Two indexes means two queries. Mitigated by parallel execution.

Index storage: You need to maintain both a vector index and an inverted index.

Tuning complexity: Weighted fusion requires labeled data to tune. RRF sidesteps this.

Not always a win: On domains where embeddings are very strong (pure paraphrase tasks), dense alone can match hybrid.

Hybrid Search vs Pure Vector Search

AspectPure VectorHybrid
Semantic matchesStrongStrong
Exact token matchesWeakStrong
Rare tokens, SKUsWeakStrong
InfrastructureSimpleTwo indexes
Typical recall liftBaseline+10–30%

Modern vector databases (Pinecone, Weaviate, Qdrant, Elasticsearch) offer hybrid search as a first-class feature, so the operational cost is low.

Sources: