// how retrieval actually works
Retrieval-augmented generation (RAG) breaks your docs into chunks, turns each chunk into a vector (an embedding), and at query time pulls the handful of chunks most similar to the question — then hands only those to the model.
So the model can only be as good as the chunks you retrieved. If the right text never makes it into a retrievable chunk, no amount of prompting saves you. The chunk boundary is the whole ballgame.
Note: pure vector similarity is the baseline. Strong production pipelines also add keyword search (BM25) and a reranking step — see the fix below.
Three ways chunking goes wrong
1 · Chunks too big → a blurry average
Embeddings are made by averaging token vectors into one fixed-length vector. Cram five topics into one chunk and its embedding becomes a muddy average that matches no specific question sharply.
As Weaviate puts it: oversized chunks "mix multiple ideas together… a noisy, 'averaged' embedding that doesn't clearly represent any single topic."
2 · Chunks too small → dangling references
Split mid-thought and a chunk can contain a pronoun whose antecedent lives in another chunk. Embedded alone, it's meaningless — and often won't be retrieved at all. This has a name: the anaphoric reference problem.
3 · No overlap → the answer on a boundary
Fixed-size cuts fall in arbitrary places. If the answer straddles the boundary, neither neighbouring chunk holds the whole thing.
The fix
do split on structure, not a character count
Cut on the document's own seams — headings, paragraphs, sentence and function boundaries — so a chunk is a coherent unit, not a random 1000-character window.
Overlap is a sensible default, not a guaranteed win — and the optimal chunk size depends on your data and embedding model. Test it against your own retrieval metrics.
do keep one idea per chunk, with its context attached
The highest-leverage move: before embedding, prepend each chunk with the context it needs to stand alone — its section title, document, and a one-line summary. Anthropic calls this Contextual Retrieval.
The one-line takeaway
Before you blame the model, look at your chunks. Split on structure, add a little overlap, and keep one idea per chunk with its context attached. Fix the chunks first — most "bad RAG" is a chunking bug.
Sources: Anthropic — Contextual Retrieval · Pinecone — Chunking strategies · Weaviate — Chunking strategies · LangChain — RecursiveCharacterTextSplitter
One concept a week. Free.
The deeper, copy-paste version of each ToolCall short — MCP, RAG, agents, caching — in your inbox.
// total: 0.00 · spam: void · unsubscribe: one click
