Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L4Lesson 1

Audit Your Token Budget

After this, you'll be able to count what's actually eating your context window and trim it by at least 30% without losing anything that matters.

Before you start

Before diving in, complete Do Not Stuff CLAUDE.md so you can apply the rules-vs-runtime distinction to a full token audit across your entire context window.

The idea

You are 45 turns into a session, things are going sideways, and you have no idea why. You paste more context hoping it will help. It gets worse. The problem is not the model. The problem is your token budget.

Think of the context window as RAM, not a hard drive. Claude models support up to 200K tokens; GPT-4o supports up to 128K (as of early 2026). But capacity is not the goal. Signal is.

Here is the before and after: A product manager pasted a 200-line requirements doc to ask a question about one acceptance criterion. That doc cost roughly 1,600 tokens. The criterion itself? 40 tokens. The noise-to-signal ratio was 40:1, and the output reflected it: Claude kept referencing unrelated requirements and missing the actual question. When the PM repasted just the two relevant criteria plus a one-sentence framing, Claude answered correctly on the first try.

Your context has four cost centers: the system prompt (CLAUDE.md and any persistent instructions), working context (files and data you've pasted this session), conversation history (every prior turn), and the model's own output tokens. Most people focus on prompts. The real waste is usually history and over-pasted files.

Try it (15 min)

Watch out for

  • Treating the context limit as a target to fill, not a ceiling to stay under
  • Keeping old conversation turns because you might need them. Start a new session if the topic changed.
  • Pasting entire files when you only need one function, one section, or one table
  • Forgetting that your CLAUDE.md counts against your budget every single session
  • Assuming more context always helps. Past about 40-50K tokens of low-signal content, output quality measurably drops.

Paste this into Claude:

I want to audit my current context. Here is everything I've shared in this conversation so far: [paste your session history or summarize the main blocks of content you've included]. Tell me: (1) What's taking up the most space that isn't directly needed right now? (2) What could I summarize or drop entirely? (3) What's load-bearing and must stay? Give me a ranked list of what to cut first.

What good looks like:

  • You identified the single largest context cost center in your current session
  • You got a specific list of what to cut vs. keep from Claude
  • You can estimate (roughly) how many tokens your current session is using
  • You found at least one file or block you were pasting in full that a snippet would cover
  • You understand the difference between load-bearing context and context you added out of habit

What a good response looks like:

Here is your context audit, ranked by cost and cuttability.

Largest cost center: your conversation history from turns 1-31. I estimate roughly 18,000 tokens. Turns 1-12 were about the database schema redesign, which you told me is complete. That entire block is dead weight for your current question about the API endpoints.

Second: the full package.json you pasted (turn 8). You pasted 140 lines to ask about one dependency version. That's roughly 1,100 tokens. Only the `dependencies` section (about 20 lines) was relevant.

Third: the full README (turn 3). 320 lines, roughly 2,400 tokens. You have not referenced it since.

Load-bearing and must stay: the three endpoint files you pasted in turns 28-30, and the error message from turn 31. Those are directly connected to your current question.

Cut list, ranked: (1) turns 1-27 of conversation history, (2) full README, (3) full package.json. Keeping only turns 28-31 plus the three endpoint files would take you from an estimated 24,000 tokens to roughly 3,200. That is an 87% reduction.

Go deeper (20 min)

Paste this into Claude:

Take the context you just audited. Now prune it: remove or summarize everything on the 'cut' list. Start a fresh conversation with only the load-bearing items plus one sentence of session framing. Run the same core request you were doing before. Compare the output quality. Paste both responses side by side and tell me: which is more focused and why?

What good looks like:

  • You reduced your context by at least 30% before rerunning the task
  • The output in the pruned session was at least as good, and likely more focused
  • You can articulate one specific thing the pruned version got right that the original missed
  • You saved the pruned starting point as a template for this type of task

What a good response looks like:

Pruned session vs original session, same task: 'Why is the /api/users endpoint returning 401 on valid tokens?'

Original session (24,000 tokens, 31 prior turns): My response spent two paragraphs referencing the database schema redesign from earlier in the conversation before getting to your question. I suggested checking the migration files even though you told me the migration was done. My final answer was buried and vague.

Pruned session (3,200 tokens, 3 files + error message): I went directly to the issue. Your requireAuth middleware on line 23 of src/middleware/auth.ts is calling `jwt.verify(token, process.env.JWT_SECRET)`, but your token is signed with `process.env.JWT_SIGNING_KEY`. Those are two different environment variables. The 401 is a signing key mismatch, not an expiry issue.

What the pruned version got right that the original missed: no distraction from dead context, and immediate focus on the three files that actually matter. The answer in the pruned session took 4 sentences. The original took 12 sentences and was still vague.

When this breaks

  • Breaks when the task genuinely requires a wide context (multi-file refactors across loosely coupled modules) because aggressive trimming removes connections the model would otherwise infer.
  • Breaks when CLAUDE.md is the actual bloat source because per-session pruning leaves it untouched and you keep paying the same tax on every fresh session.
  • Breaks when you only count tokens you pasted because the model's own output and the system prompt are invisible cost centers that already crowd the window before turn one.

Claude can do it for you

At the start of any session, paste your context and say: 'Audit this for me. What is taking up space that I probably do not need for this specific task?' Claude will identify the waste faster than you will.

You can now

Produce a ranked cut list for your current session that names the largest cost center, gives an estimated token count, and labels each block as load-bearing or cuttable.

Key takeaways

10K tokens of sharp, relevant context beats 80K tokens of everything you could think of. Audit before you paste, and prune before you escalate.

  • Audit before you paste. Most context bloat is invisible until you measure cost centers.
  • The four cost centers are system prompt, working context, conversation history, and model output.
  • History and over-pasted files are the usual waste. Prompts are rarely the problem.
  • Pruning by 70-80% routinely improves output quality, not just cost.
  • Save the pruned starting point as a template so you do not redo the audit each time.

Go deeper

  • Anthropic: Context Windows explained
  • Anthropic: Long context prompting tips
  • Claude Code memory and CLAUDE.md docs
Up nextPrune Like a Pro→