After this, you'll be able to apply a concrete decision rule to any document or dataset and choose between retrieval and full-context loading without guessing.
Before you start
Before diving in, complete Structured Output as Contract so you can measure retrieval precision against a defined output shape rather than eyeballing the results.
The idea
If the information is static and large, retrieve it. If it is dynamic and small, fit it in. The hard cases live in between.
Here is the before and after: You are probably loading the wrong things in the wrong way right now. A 2-page product spec (roughly 1,500 tokens) that changes every sprint needs to be fit in, because your agent needs the full picture to answer questions about it. A 100-page technical specification (roughly 70,000 tokens) that changes quarterly needs to be retrieved, because your agent usually needs one section at a time. A codebase with 800 files? Retrieve. A five-row database table of team members your agent references every session? Fit it in.
The decision is not just about token count. It is about whether the task requires breadth (retrieval handles this well) or depth (full context handles this well). Debugging a bug that spans three files is a depth task. Answering 'what does this company policy say about X' is a breadth task.
The hidden cost of retrieval that most tutorials omit: relevance degradation. When you retrieve, the model works from chunks, not the full document. It cannot see cross-document connections or patterns that span multiple sections. A retrieval system that returns the right chunk 73% of the time means the model is working from incomplete or wrong context 27% of the time. That cost is invisible until you measure it. Retrieval is not always the advanced move. For depth tasks, it makes the output worse.
Try it (15 min)
Watch out for
Paste this into Claude:
I want to make a retrieve-vs-fit decision for my current workflow. Here are three things I regularly load into Claude: [list 1-3 things: documents, datasets, files, or codebases you regularly include in context]. For each one, tell me: (1) estimated token size, (2) how often it changes, (3) whether my typical task needs breadth (find a specific section) or depth (understand the whole thing), and then give me your recommendation: fit it in, retrieve it, or a hybrid. For any 'retrieve' recommendation, tell me what chunk strategy makes sense for that content type.
What good looks like:
What a good response looks like:
Here is the retrieve-vs-fit analysis for your three content sources. 1. Product requirements doc (your Notion PRD, ~2,200 tokens, updated weekly) Decision: FIT IT IN. Your tasks involve reasoning across the whole doc, not finding one section. It changes often enough that stale embeddings would silently degrade quality. At 2,200 tokens it costs roughly $0.007 per call with Claude Sonnet. That is cheap. Fitting it in gives the model the full picture every time. 2. Company runbook (180-page ops handbook, ~120,000 tokens, updated quarterly) Decision: RETRIEVE. At 120K tokens this exceeds a cost-effective fit-in threshold for routine queries. Your typical task is 'what is the escalation path for a Tier 2 incident', which is a breadth task. Chunk strategy: 300-token chunks with 60-token overlap, section heading as metadata. Expected hit rate at top-3: 70-80% for specific procedure queries. 3. Team member table (8 rows, 5 columns, ~400 tokens, changes when headcount changes) Decision: FIT IT IN. 400 tokens is noise. Retrieval overhead (embedding + index query) costs more in latency and complexity than just pasting the table. It changes rarely, so no stale-embedding risk. What you have been doing wrong: you mentioned you built a RAG pipeline over the PRD. That is the depth task on the small doc. Retrieval is actively hurting you there. Switch to fit-in for the PRD, keep retrieval for the runbook.
When this breaks
Claude can do it for you
List the documents and data sources you load into Claude regularly. Ask: 'For each of these, tell me whether I should fit it in context or build a retrieval layer. Give me the token estimate, the decision, and the reason.' It will apply the rule for you and flag the cases where your current approach is probably hurting quality.
You can now
Identify at least one piece of content you have been retrieving that should be fit in context, and one you have been fitting that should be retrieved, and justify each switch using the decision rule.
Key takeaways
Static and large: retrieve. Dynamic and small: fit in. Depth task: fit in. Breadth task: retrieve. When in doubt, measure the retrieval hit rate before committing.