Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L9Free

Running Background Agents

Async AI work that doesn't need you watching

After this, you'll be able to dispatch a background agent on a low-risk task, walk away, and come back to review the diff with calibrated trust in what the agent got right and what it missed.

Before you start

Before diving in, complete Harness Engineering 101 so the verification loops and observability infrastructure this lesson relies on are already in place.

The idea

Background agents became practical when model reliability crossed a threshold: reviewing a diff is cheaper than writing the code yourself. That is the shift Level 9 is on the other side of. You dispatch tasks, do your actual strategic work, and come back to review what finished. You did not write any of those diffs. Agents did, while you were offline.

Split work by model role from the start: one agent implements, a different one reviews. They do not share context. The review agent has no stake in defending the implementation agent's choices. That separation catches errors that self-review misses, and it costs less than running one expensive model for everything.

Here is the before and after: a team ran a capable model for implementation and a lighter model for review across 30 background tasks in one week. The review agent caught 11 issues the implementation agent missed (3 type errors, 4 missing null checks, 4 logic edge cases) at roughly 75% less cost than running the capable model for both roles. Separate context, separate stake, lower cost, better catch rate.

The hard problem at Level 9 is not running agents. It is coordination. Agent B starts from a codebase that Agent A has already modified. Stale context produces subtle bugs. The standard solution is branch-per-agent with merge gates: each agent works in isolation, and a merge only happens after validation passes. Branch isolation defers the coordination problem. It does not eliminate it.

Cost is a first-class concern here. Running five parallel agents on a capable model can clear hundreds of dollars in a day. Use cheaper models for lower-stakes tasks. Set per-run budgets. Monitor spend in real time. This is not optional hygiene. It is part of the architecture.

Try it (5 min)

Watch out for

  • Hovering over the agent's progress instead of walking away. Background agents only earn back time if you actually leave them alone for the full run.
  • Picking a high-stakes task for your first dispatch. Documentation, test generation, and dependency bumps are calibration tasks. Production refactors are not.
  • Letting the implementation agent review its own work. Self-review misses the errors a separate reviewer would catch. Always split implementation and review.
  • Skipping the cost cap. A loop that runs unbudgeted on a capable model can clear $50 before you notice. Set a per-run token budget on the first dispatch.
  • Treating one successful run as proof the task type is agent-suitable. Run three before you generalize. The first one might have been an easy case.

Paste this into Claude:

I want to dispatch my first background agent run. Here is a low-risk task from my current project: [describe the task: e.g. 'update the README install section to match our new package name', 'generate unit tests for src/utils/dateFormat.ts', 'bump the lodash dependency from 4.17.20 to 4.17.21 and run the test suite']. Run this as a background agent. Open a PR with the diff when finished. While you work, I am stepping away for at least an hour. When I return, I want to review only the diff and the test output, not your intermediate steps. Include in the PR description: (1) what you changed, (2) which tests passed, (3) any decision you made that I should review specifically.

What good looks like:

  • You dispatched a real background agent run on a low-risk task and walked away for at least an hour without checking in
  • You reviewed the resulting diff in under 10 minutes and can name exactly what the agent got right and what it missed
  • You can state whether reviewing the diff was faster than writing the code yourself, with a time comparison

When this breaks

  • Breaks when implementation and review agents share context because the reviewer inherits the same blind spots and rubber-stamps instead of catching errors.
  • Breaks when you dispatch agents on ambiguous tasks because the agent will produce confident-looking output that misses the unstated constraints, and the review step has nothing concrete to evaluate against.
  • Breaks when cost management is treated as cleanup rather than architecture because parallel agents on a capable model can burn hundreds of dollars before any after-the-fact dashboard surfaces the spike.

You can now

Dispatch one low-risk background agent task, walk away for an hour, and come back to review a diff that you can either merge as-is or reject with one specific actionable revision request.

Key takeaways

Level 9 is when reviewing a diff is cheaper than writing the code. You dispatch, walk away, and come back to review what finished without you.

  • Background agents work when reviewing a diff is cheaper than writing the code yourself
  • Branch-per-agent isolation prevents stale context from corrupting parallel work
  • Separate implementation and review agents. They should not share context or defend each other's choices
  • Cost management is architecture at this level. Budget per run, monitor spend, route cheap tasks to smaller models
Up nextDesigning Autonomous Teams→