Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L5Lesson 3

Retrieve vs Fit: The Decision Rule

After this, you'll be able to apply a concrete decision rule to any document or dataset and choose between retrieval and full-context loading without guessing.

Before you start

Before diving in, complete Structured Output as Contract so you can measure retrieval precision against a defined output shape rather than eyeballing the results.

The idea

If the information is static and large, retrieve it. If it is dynamic and small, fit it in. The hard cases live in between.

Here is the before and after: You are probably loading the wrong things in the wrong way right now. A 2-page product spec (roughly 1,500 tokens) that changes every sprint needs to be fit in, because your agent needs the full picture to answer questions about it. A 100-page technical specification (roughly 70,000 tokens) that changes quarterly needs to be retrieved, because your agent usually needs one section at a time. A codebase with 800 files? Retrieve. A five-row database table of team members your agent references every session? Fit it in.

The decision is not just about token count. It is about whether the task requires breadth (retrieval handles this well) or depth (full context handles this well). Debugging a bug that spans three files is a depth task. Answering 'what does this company policy say about X' is a breadth task.

The hidden cost of retrieval that most tutorials omit: relevance degradation. When you retrieve, the model works from chunks, not the full document. It cannot see cross-document connections or patterns that span multiple sections. A retrieval system that returns the right chunk 73% of the time means the model is working from incomplete or wrong context 27% of the time. That cost is invisible until you measure it. Retrieval is not always the advanced move. For depth tasks, it makes the output worse.

Try it (15 min)

Watch out for

  • Defaulting to RAG because it sounds more advanced. If the document fits in context and you need the full picture, retrieval makes things worse.
  • Ignoring retrieval quality when you do retrieve. A 73% hit rate means 27% of responses are working from the wrong content.
  • Treating 'large document' as the only signal for retrieval. Frequency of access and task type matter as much as size.
  • Building a retrieval layer for a dataset that changes daily. Stale embeddings cause silent quality degradation. Re-index on a schedule or fit it in.
  • Forgetting the hybrid option. For a 40-page document, fitting in a high-level summary plus retrieving specific sections is often better than either pure strategy.

Paste this into Claude:

I want to make a retrieve-vs-fit decision for my current workflow. Here are three things I regularly load into Claude: [list 1-3 things: documents, datasets, files, or codebases you regularly include in context]. For each one, tell me: (1) estimated token size, (2) how often it changes, (3) whether my typical task needs breadth (find a specific section) or depth (understand the whole thing), and then give me your recommendation: fit it in, retrieve it, or a hybrid. For any 'retrieve' recommendation, tell me what chunk strategy makes sense for that content type.

What good looks like:

  • You applied the decision rule to at least two real pieces of content from your own workflow
  • You got a specific recommendation (fit, retrieve, or hybrid) with a clear reason for each
  • You understand the relevance-degradation cost of retrieval and can describe one task where it would matter
  • You identified at least one thing you have been fitting in context that should be retrieved
  • You identified at least one thing you have been treating as a retrieval problem that fits in context

What a good response looks like:

Here is the retrieve-vs-fit analysis for your three content sources.

1. Product requirements doc (your Notion PRD, ~2,200 tokens, updated weekly)
Decision: FIT IT IN. Your tasks involve reasoning across the whole doc, not finding one section. It changes often enough that stale embeddings would silently degrade quality. At 2,200 tokens it costs roughly $0.007 per call with Claude Sonnet. That is cheap. Fitting it in gives the model the full picture every time.

2. Company runbook (180-page ops handbook, ~120,000 tokens, updated quarterly)
Decision: RETRIEVE. At 120K tokens this exceeds a cost-effective fit-in threshold for routine queries. Your typical task is 'what is the escalation path for a Tier 2 incident', which is a breadth task. Chunk strategy: 300-token chunks with 60-token overlap, section heading as metadata. Expected hit rate at top-3: 70-80% for specific procedure queries.

3. Team member table (8 rows, 5 columns, ~400 tokens, changes when headcount changes)
Decision: FIT IT IN. 400 tokens is noise. Retrieval overhead (embedding + index query) costs more in latency and complexity than just pasting the table. It changes rarely, so no stale-embedding risk.

What you have been doing wrong: you mentioned you built a RAG pipeline over the PRD. That is the depth task on the small doc. Retrieval is actively hurting you there. Switch to fit-in for the PRD, keep retrieval for the runbook.

When this breaks

  • Breaks when the decision is made on token count alone because access frequency, change rate, and task depth dominate the actual quality outcome and a small but volatile dataset can still hurt under retrieval.
  • Breaks when applied to depth tasks (debugging across files, reasoning across sections) because chunk-based retrieval strips the cross-section connections the task requires, and the model produces locally-correct but globally-wrong answers.
  • Breaks when the dataset changes daily but embeddings are refreshed quarterly because the index silently returns stale chunks the model treats as current truth, with no signal that anything is wrong.

Claude can do it for you

List the documents and data sources you load into Claude regularly. Ask: 'For each of these, tell me whether I should fit it in context or build a retrieval layer. Give me the token estimate, the decision, and the reason.' It will apply the rule for you and flag the cases where your current approach is probably hurting quality.

You can now

Identify at least one piece of content you have been retrieving that should be fit in context, and one you have been fitting that should be retrieved, and justify each switch using the decision rule.

Key takeaways

Static and large: retrieve. Dynamic and small: fit in. Depth task: fit in. Breadth task: retrieve. When in doubt, measure the retrieval hit rate before committing.

  • Static and large: retrieve. Dynamic and small: fit in. The middle cases need a depth-vs-breadth check
  • Depth tasks (cross-section reasoning, multi-file debugging) belong in full context. Retrieval breaks them
  • Breadth tasks (find one section, lookup a procedure) belong in retrieval. Full context wastes tokens
  • Retrieval has a hidden quality cost. A 73% hit rate means 27% of answers work from the wrong chunks
  • The hybrid option (summary in context, details on retrieval) often beats either pure strategy for medium documents

Go deeper

  • Anthropic: Long context window tips
  • LlamaIndex: High-Level Concepts (RAG)
  • 12-Factor Agents: own your context window