After this, you'll be able to count what's actually eating your context window and trim it by at least 30% without losing anything that matters.
Before you start
Before diving in, complete Do Not Stuff CLAUDE.md so you can apply the rules-vs-runtime distinction to a full token audit across your entire context window.
The idea
You are 45 turns into a session, things are going sideways, and you have no idea why. You paste more context hoping it will help. It gets worse. The problem is not the model. The problem is your token budget.
Think of the context window as RAM, not a hard drive. Claude models support up to 200K tokens; GPT-4o supports up to 128K (as of early 2026). But capacity is not the goal. Signal is.
Here is the before and after: A product manager pasted a 200-line requirements doc to ask a question about one acceptance criterion. That doc cost roughly 1,600 tokens. The criterion itself? 40 tokens. The noise-to-signal ratio was 40:1, and the output reflected it: Claude kept referencing unrelated requirements and missing the actual question. When the PM repasted just the two relevant criteria plus a one-sentence framing, Claude answered correctly on the first try.
Your context has four cost centers: the system prompt (CLAUDE.md and any persistent instructions), working context (files and data you've pasted this session), conversation history (every prior turn), and the model's own output tokens. Most people focus on prompts. The real waste is usually history and over-pasted files.
Try it (15 min)
Watch out for
Paste this into Claude:
I want to audit my current context. Here is everything I've shared in this conversation so far: [paste your session history or summarize the main blocks of content you've included]. Tell me: (1) What's taking up the most space that isn't directly needed right now? (2) What could I summarize or drop entirely? (3) What's load-bearing and must stay? Give me a ranked list of what to cut first.
What good looks like:
What a good response looks like:
Here is your context audit, ranked by cost and cuttability. Largest cost center: your conversation history from turns 1-31. I estimate roughly 18,000 tokens. Turns 1-12 were about the database schema redesign, which you told me is complete. That entire block is dead weight for your current question about the API endpoints. Second: the full package.json you pasted (turn 8). You pasted 140 lines to ask about one dependency version. That's roughly 1,100 tokens. Only the `dependencies` section (about 20 lines) was relevant. Third: the full README (turn 3). 320 lines, roughly 2,400 tokens. You have not referenced it since. Load-bearing and must stay: the three endpoint files you pasted in turns 28-30, and the error message from turn 31. Those are directly connected to your current question. Cut list, ranked: (1) turns 1-27 of conversation history, (2) full README, (3) full package.json. Keeping only turns 28-31 plus the three endpoint files would take you from an estimated 24,000 tokens to roughly 3,200. That is an 87% reduction.
Go deeper (20 min)
Paste this into Claude:
Take the context you just audited. Now prune it: remove or summarize everything on the 'cut' list. Start a fresh conversation with only the load-bearing items plus one sentence of session framing. Run the same core request you were doing before. Compare the output quality. Paste both responses side by side and tell me: which is more focused and why?
What good looks like:
What a good response looks like:
Pruned session vs original session, same task: 'Why is the /api/users endpoint returning 401 on valid tokens?' Original session (24,000 tokens, 31 prior turns): My response spent two paragraphs referencing the database schema redesign from earlier in the conversation before getting to your question. I suggested checking the migration files even though you told me the migration was done. My final answer was buried and vague. Pruned session (3,200 tokens, 3 files + error message): I went directly to the issue. Your requireAuth middleware on line 23 of src/middleware/auth.ts is calling `jwt.verify(token, process.env.JWT_SECRET)`, but your token is signed with `process.env.JWT_SIGNING_KEY`. Those are two different environment variables. The 401 is a signing key mismatch, not an expiry issue. What the pruned version got right that the original missed: no distraction from dead context, and immediate focus on the three files that actually matter. The answer in the pruned session took 4 sentences. The original took 12 sentences and was still vague.
When this breaks
Claude can do it for you
At the start of any session, paste your context and say: 'Audit this for me. What is taking up space that I probably do not need for this specific task?' Claude will identify the waste faster than you will.
You can now
Produce a ranked cut list for your current session that names the largest cost center, gives an estimated token count, and labels each block as load-bearing or cuttable.
Key takeaways
10K tokens of sharp, relevant context beats 80K tokens of everything you could think of. Audit before you paste, and prune before you escalate.
Go deeper