GEO

Semantic Chunking

Semantic chunking is a document-splitting technique that cuts text at meaning boundaries rather than fixed character or token counts. It uses embeddings to detect when adjacent sentences shift topic, then places the cut there — so each resulting chunk is internally coherent and retrievable as a single idea.

Semantic chunking is a document-splitting technique that cuts text at meaning boundaries rather than fixed character or token counts. It uses embeddings to detect when adjacent sentences shift topic, then places the cut there — so each resulting chunk is internally coherent and retrievable as a single idea.

Why It Matters

Naive chunking splits at every N tokens or at paragraph breaks, oblivious to meaning. This routinely chops a single argument in half, placing the premise in one chunk and the conclusion in another — so the retriever returns fragments that don't make sense. Semantic chunking fixes this by respecting topic shifts. LlamaIndex and LangChain benchmark reports from 2024–2025 show semantic chunking improves RAG answer quality 8–20% on open-domain QA versus fixed-size splits, with the biggest gains on long technical documents.

How It Works

1. Split into sentences: Use a sentence tokenizer to get atomic units.

2. Embed each sentence: A small embedding model produces one vector per sentence.

3. Compute adjacent similarities: For each sentence pair, measure cosine similarity between embeddings.

4. Find the break points: When similarity drops below a threshold (or sits in the bottom percentile), mark it as a topic shift.

5. Group sentences between breaks into chunks: Each chunk is topically coherent.

6. Optional size bounds: Merge tiny chunks or split huge ones so retrieval stays practical.

Semantic vs Fixed-Size vs Recursive Chunking

StrategyHow it splitsCoherenceCostWhen to use
Fixed-sizeEvery N tokensLowFreePrototyping, logs
RecursiveParagraph → sentence → wordMediumFreeGeneral-purpose default
SemanticEmbedding similarity boundariesHighEmbedding costTechnical docs, long articles
AgenticLLM decides per documentHighestVery highHigh-stakes, low-volume

Semantic chunking sits between the cheap-and-dumb and expensive-and-smart ends — a good default once you outgrow recursive splitting.

Tuning Knobs

Similarity threshold: Low threshold → more chunks, tighter topic coherence, worse context continuity. High threshold → fewer, longer chunks. Start around the 15–25th percentile of adjacent similarities.

Embedding model: A cheap small-embedding model is usually enough — you're measuring relative shifts, not absolute meaning.

Minimum chunk size: Very short chunks (one sentence) retrieve poorly because they lack context. Enforce a floor.

Maximum chunk size: Bound chunks so none exceeds the downstream context window.

Overlap: A small sentence overlap (1–2 sentences) between adjacent chunks rescues edge cases where the boundary is ambiguous.

When It Doesn't Help

Short documents: If the whole doc fits in one chunk, splitting at all is overhead.

Highly repetitive text: Logs, product listings, and tables have low natural topic drift — semantic chunking degenerates to fixed-size.

Structured content: Tables, code, and JSON should be split by structure, not meaning.

When retrieval isn't the bottleneck: If hallucination comes from prompt design or reranking, fixing chunking won't help.

Sources: