Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L5Free

Scaling Context Across Projects

Managing what the model knows at scale

After this, you'll be able to decide whether to retrieve or fit information in context based on size, change frequency, and task type, and you'll know why structured output becomes a contract at this level.

Before you start

You'll want a working sense of Context Engineering Fundamentals before this lesson extends those skills to data that exceeds a single context window.

The idea

At Level 4 you learned to keep context clean. At Level 5, the problem shifts: what do you do when the information you need does not fit in the window at all? A large codebase, a document library, a product spec longer than a novel. You cannot paste it all in. You need a different architecture.

RAG (Retrieval-Augmented Generation) shifts the question from 'what can I fit in the window' to 'what should I retrieve right now.' Index your docs once, embed the query at runtime, and pull only the relevant chunks. The failure mode is not building the index. It is retrieval quality. Bad chunking returns irrelevant sections, and the model works from those as if they were correct.

The second problem Level 5 solves is output format. When your AI output feeds a database, a UI, or another model call, free text fails. Structured output (JSON mode, XML tags, schema constraints) forces a predictable shape. Without it, every consumer of your pipeline writes its own fragile parser. With it, you have a contract.

The decision rule for retrieve vs fit: if the information is static and large, retrieve it. If it is dynamic and small, fit it in context directly. A full codebase belongs in a retrieval layer. A two-page spec belongs in the window. The hidden cost of retrieval is that the model works from chunks and summaries, not the whole document, which can cause it to miss connections that span sections.

Here is the before and after: a team indexed 800 pages of internal policy docs with 4K-token chunks. Queries about refund policy kept returning payment processing chunks instead because the sections were split mid-topic. Switching to semantic chunking (splitting at heading boundaries) dropped irrelevant retrievals from 40% to 6%. The index build time was identical. The retrieval quality difference was the entire product.

Try it (5 min)

Watch out for

  • Defaulting to RAG because it sounds advanced. If the document fits in context and you need cross-section reasoning, retrieval makes the answer worse.
  • Building a retrieval index without measuring hit rate. A 60% hit rate means almost half your responses work from the wrong chunks, and you will not know.
  • Using JSON mode when you need field-type enforcement. JSON mode guarantees syntax only. `{'priority': 'high'}` is valid JSON but breaks an integer column.
  • Treating structured output as one-and-done. Schemas evolve as your downstream consumer evolves. Re-run a 5-output sample test whenever you change either side.

Paste this into Claude:

I have a workflow where my AI output feeds a downstream system. Here is the prompt I currently use: [paste your prompt]. Here is what consumes the output: [describe: a database write, a UI render, a second prompt, a webhook]. Help me: (1) identify what shape the consumer actually needs (fields, types, constraints), (2) choose between JSON mode, XML tags, or schema validation based on whether I need strict typing or just predictable structure, (3) rewrite the prompt to enforce that shape, and (4) show me a valid output and an invalid output so I can write a test against the contract.

What good looks like:

  • Your rewritten prompt produces output in the same shape across at least 5 test runs
  • The output parses cleanly into the downstream consumer without manual fix-ups
  • You can name the structured-output method you chose and why it fits this use case

When this breaks

  • Breaks when you retrieve for a depth task that needs cross-section synthesis because the model only sees disconnected chunks and cannot reason about patterns spanning the whole document.
  • Breaks when retrieval quality is unmeasured because every silent retrieval miss looks like a model failure, and you waste cycles tuning prompts instead of fixing the chunking strategy that is actually causing the wrong answers.
  • Breaks when structured output is enforced only in the prompt and not at the API level because the model occasionally drifts on field types and your downstream system swallows malformed values without raising.

You can now

Distinguish between a depth task (fit in context) and a breadth task (retrieve), then justify the choice using token size, change frequency, and what the task actually requires.

Key takeaways

Level 5 replaces two failure modes (context overflow, unpredictable output) with two composable solutions: retrieve only what you need, and constrain the output shape so consumers can rely on it.

  • RAG solves context overflow by retrieving only relevant chunks at query time, not loading everything
  • Retrieval quality (chunking strategy, re-ranking) fails more often than index building. Measure it
  • Structured output is a contract between your AI and whatever consumes its output. Use it whenever output feeds a system
  • Retrieve for large and static. Fit in context for small and dynamic. Know which is which.