Async AI work that doesn't need you watching
After this, you'll be able to dispatch a background agent on a low-risk task, walk away, and come back to review the diff with calibrated trust in what the agent got right and what it missed.
Before you start
Before diving in, complete Harness Engineering 101 so the verification loops and observability infrastructure this lesson relies on are already in place.
The idea
Background agents became practical when model reliability crossed a threshold: reviewing a diff is cheaper than writing the code yourself. That is the shift Level 9 is on the other side of. You dispatch tasks, do your actual strategic work, and come back to review what finished. You did not write any of those diffs. Agents did, while you were offline.
Split work by model role from the start: one agent implements, a different one reviews. They do not share context. The review agent has no stake in defending the implementation agent's choices. That separation catches errors that self-review misses, and it costs less than running one expensive model for everything.
Here is the before and after: a team ran a capable model for implementation and a lighter model for review across 30 background tasks in one week. The review agent caught 11 issues the implementation agent missed (3 type errors, 4 missing null checks, 4 logic edge cases) at roughly 75% less cost than running the capable model for both roles. Separate context, separate stake, lower cost, better catch rate.
The hard problem at Level 9 is not running agents. It is coordination. Agent B starts from a codebase that Agent A has already modified. Stale context produces subtle bugs. The standard solution is branch-per-agent with merge gates: each agent works in isolation, and a merge only happens after validation passes. Branch isolation defers the coordination problem. It does not eliminate it.
Cost is a first-class concern here. Running five parallel agents on a capable model can clear hundreds of dollars in a day. Use cheaper models for lower-stakes tasks. Set per-run budgets. Monitor spend in real time. This is not optional hygiene. It is part of the architecture.
Try it (5 min)
Watch out for
Paste this into Claude:
I want to dispatch my first background agent run. Here is a low-risk task from my current project: [describe the task: e.g. 'update the README install section to match our new package name', 'generate unit tests for src/utils/dateFormat.ts', 'bump the lodash dependency from 4.17.20 to 4.17.21 and run the test suite']. Run this as a background agent. Open a PR with the diff when finished. While you work, I am stepping away for at least an hour. When I return, I want to review only the diff and the test output, not your intermediate steps. Include in the PR description: (1) what you changed, (2) which tests passed, (3) any decision you made that I should review specifically.
What good looks like:
When this breaks
You can now
Dispatch one low-risk background agent task, walk away for an hour, and come back to review a diff that you can either merge as-is or reject with one specific actionable revision request.
Key takeaways
Level 9 is when reviewing a diff is cheaper than writing the code. You dispatch, walk away, and come back to review what finished without you.