Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L9Lesson 3

Manage the Money

After this, you'll be able to set per-run budgets, route tasks to the right model tier by stakes, and monitor spend before it becomes a surprise.

Before you start

Before diving in, complete Three-Tier Setup so your budget controls map to real model-tier assignments rather than a flat cost estimate for all agents.

The idea

Here is the before and after: You scale to five parallel Opus agents and your bill for that day is $180. It is not a bug. It is the math: 10 million tokens per run, Opus pricing, five agents. Most teams hit this within two weeks of scaling background agents. The fix is not to run fewer agents. It is to route tasks to the right model tier before you dispatch.

Model routing by stakes is the core pattern. Use the most capable model where it actually matters: complex reasoning, architectural decisions, first-pass implementation of ambiguous requirements. Use cheaper models where the task is well-defined. Sonnet for review and synthesis costs about 5x less than Opus. Haiku for formatting and label generation costs 20x less. A Sonnet PR review run costs roughly 8% of an Opus implementation run. Route correctly and the savings pay for themselves in the first week.

Per-run budgets are the circuit breaker. Every background agent run should have a maximum token cap. If the agent hits the cap, it surfaces what it has and stops. Without a cap, an unexpected loop consumes unbounded resources. The cap is not a quality ceiling. It is a hard stop.

Real-time spend monitoring means you see cost as it accumulates, not in next month's invoice. Treat cost the same way you treat test failures: a signal that fires while the work is running, not after.

Try it (18 min)

Watch out for

  • Routing everything to the most capable model because you're unsure. Uncertainty is not a reason to use Opus. Write the spec more clearly, then route to a cheaper model.
  • Setting a per-run budget but not wiring it. A token cap you defined but never passed to the agent is not a circuit breaker.
  • Monitoring spend only in the billing dashboard. By the time you see it there, it already happened. Wire spend logging into your agent runs at the point of execution.
  • Forgetting that loops multiply cost. A 3-iteration inner loop at 100k tokens per pass costs 3x. Budget for the expected number of cycles, not a single pass.
  • Treating model routing as permanent. Model capability rankings shift with each major release. Revisit your routing table every few months.

Paste this into Claude:

I want to set up cost management for my background agents. Help me: (1) Look at the tasks I run most frequently and classify each as high-stakes (needs Opus-tier), medium-stakes (Sonnet-tier), or low-stakes (Haiku-tier). My tasks are: [list 4-6 tasks you run regularly, e.g. 'implement a new API endpoint', 'review a PR for logic errors', 'update doc comments', 'format code output', 'generate test cases']. (2) For each tier, write the model routing rule I should add to my supervisor or skill definition. (3) Set a per-run token budget for each tier and explain why. (4) Show me what a simple cost log entry should look like: model, tokens in, tokens out, estimated cost, task ID.

What good looks like:

  • You classified your regular tasks into at least two model tiers (not everything goes to the most capable model)
  • Each classification has a specific reason tied to task complexity, not just cost
  • You defined per-run token budgets for at least two tiers
  • You have a cost log format with model, tokens, estimated cost, and task ID
  • You can estimate your daily cost at current usage and identify which task type drives most of it

When this breaks

  • Breaks when uncertainty drives routing decisions because every ambiguous task ends up on the most capable model, and you spend Opus dollars to compensate for a spec problem that better prompting would have solved at Sonnet pricing.
  • Breaks when per-run budgets exist as documentation but are not wired to the dispatch layer because a token cap that does not actually halt the run is not a circuit breaker, and a single runaway loop can consume the savings of an entire week.
  • Breaks when cost monitoring lives in the billing dashboard because by the time the spike shows up there, the spend has already happened, and after-the-fact alerts cannot stop a run that is already mid-flight.

Claude can do it for you

Say to Claude: 'For each of these tasks: [list], tell me which model tier it needs (Opus/Sonnet/Haiku) and why, then give me a per-run token budget. Assume I'm running up to 5 agents in parallel. Also write a one-line cost log format I can append after each run.'

You can now

Classify five recurring tasks into at least two model tiers with a specific reason per tier, set a wired per-run token cap for each tier, and produce a cost log entry showing tokens and dollars per run.

Key takeaways

Route by stakes, cap by run, monitor at runtime. Cost managed after the fact is cost you've already paid. Wire it before you scale.

  • Model routing by stakes is the core cost pattern. Opus for ambiguous reasoning, Sonnet for well-defined synthesis, Haiku for mechanical work
  • Per-run token caps are the circuit breaker. A cap you set but did not wire is not a circuit breaker
  • Real-time spend logging belongs at the dispatch layer. Billing dashboard alerts fire after the spend has already happened
  • Loops multiply cost. Budget for the expected iteration count, not a single pass

Go deeper

  • Anthropic Models Overview (pricing and capability tiers)
  • Dispatch: Multi-Model Orchestration (model routing patterns)
  • Agent Backpressure (preventing runaway cost in loops)