After this, you'll be able to set per-run budgets, route tasks to the right model tier by stakes, and monitor spend before it becomes a surprise.
Before you start
Before diving in, complete Three-Tier Setup so your budget controls map to real model-tier assignments rather than a flat cost estimate for all agents.
The idea
Here is the before and after: You scale to five parallel Opus agents and your bill for that day is $180. It is not a bug. It is the math: 10 million tokens per run, Opus pricing, five agents. Most teams hit this within two weeks of scaling background agents. The fix is not to run fewer agents. It is to route tasks to the right model tier before you dispatch.
Model routing by stakes is the core pattern. Use the most capable model where it actually matters: complex reasoning, architectural decisions, first-pass implementation of ambiguous requirements. Use cheaper models where the task is well-defined. Sonnet for review and synthesis costs about 5x less than Opus. Haiku for formatting and label generation costs 20x less. A Sonnet PR review run costs roughly 8% of an Opus implementation run. Route correctly and the savings pay for themselves in the first week.
Per-run budgets are the circuit breaker. Every background agent run should have a maximum token cap. If the agent hits the cap, it surfaces what it has and stops. Without a cap, an unexpected loop consumes unbounded resources. The cap is not a quality ceiling. It is a hard stop.
Real-time spend monitoring means you see cost as it accumulates, not in next month's invoice. Treat cost the same way you treat test failures: a signal that fires while the work is running, not after.
Try it (18 min)
Watch out for
Paste this into Claude:
I want to set up cost management for my background agents. Help me: (1) Look at the tasks I run most frequently and classify each as high-stakes (needs Opus-tier), medium-stakes (Sonnet-tier), or low-stakes (Haiku-tier). My tasks are: [list 4-6 tasks you run regularly, e.g. 'implement a new API endpoint', 'review a PR for logic errors', 'update doc comments', 'format code output', 'generate test cases']. (2) For each tier, write the model routing rule I should add to my supervisor or skill definition. (3) Set a per-run token budget for each tier and explain why. (4) Show me what a simple cost log entry should look like: model, tokens in, tokens out, estimated cost, task ID.
What good looks like:
When this breaks
Claude can do it for you
Say to Claude: 'For each of these tasks: [list], tell me which model tier it needs (Opus/Sonnet/Haiku) and why, then give me a per-run token budget. Assume I'm running up to 5 agents in parallel. Also write a one-line cost log format I can append after each run.'
You can now
Classify five recurring tasks into at least two model tiers with a specific reason per tier, set a wired per-run token cap for each tier, and produce a cost log entry showing tokens and dollars per run.
Key takeaways
Route by stakes, cap by run, monitor at runtime. Cost managed after the fact is cost you've already paid. Wire it before you scale.