Hybrid Search
Hybrid search is a retrieval technique that runs a dense vector search (semantic) and a sparse keyword search (BM25) in parallel, then fuses the results into a single ranked list. It captures both "meaning similarity" and "exact token match" in one query.
Hybrid search is a retrieval technique that runs a dense vector search (semantic) and a sparse keyword search (BM25) in parallel, then fuses the results into a single ranked list. It captures both "meaning similarity" and "exact token match" in one query.
Why It Matters
Dense vector search is great at semantic matches ("affordable laptops" ≈ "budget notebooks") but fails on rare tokens like product codes, SKUs, and proper nouns. Keyword search nails exact tokens but misses paraphrases. Hybrid search wins both — production RAG systems at Anthropic, OpenAI, and Elastic all report hybrid consistently outperforming either alone, typically 10–30% recall improvement on real-world retrieval benchmarks.
How It Works
1. Dual retrieval: The same query runs through both indexes — a vector index (dense embeddings) and an inverted index (BM25 or TF-IDF).
2. Score normalization: Dense and sparse scores live on different scales. They're normalized — min-max, z-score, or rank-based.
3. Fusion: Scores are combined into a single ranking. The most popular methods:
- Reciprocal Rank Fusion (RRF):
score = Σ 1/(k + rank_i)— rank-based, no tuning needed, extremely robust. - Weighted sum:
α * dense + (1-α) * sparse— requires tuning α per domain. - Learned fusion: A small model predicts the optimal weight per query.
4. Optional reranking: A cross-encoder reranks the top-k fused candidates for final precision.
When to Use It
Domain-specific vocabulary: Medical codes, legal citations, part numbers.
Mixed query types: When users search both with natural language and exact strings.
Long-tail recall matters: Rare queries where BM25 still shines.
You're getting zero results from vectors alone: Often an exact-match failure — hybrid fixes it.
Trade-offs
Latency: Two indexes means two queries. Mitigated by parallel execution.
Index storage: You need to maintain both a vector index and an inverted index.
Tuning complexity: Weighted fusion requires labeled data to tune. RRF sidesteps this.
Not always a win: On domains where embeddings are very strong (pure paraphrase tasks), dense alone can match hybrid.
Hybrid Search vs Pure Vector Search
| Aspect | Pure Vector | Hybrid |
|---|---|---|
| Semantic matches | Strong | Strong |
| Exact token matches | Weak | Strong |
| Rare tokens, SKUs | Weak | Strong |
| Infrastructure | Simple | Two indexes |
| Typical recall lift | Baseline | +10–30% |
Modern vector databases (Pinecone, Weaviate, Qdrant, Elasticsearch) offer hybrid search as a first-class feature, so the operational cost is low.
Sources: