Running Background Agents

Async AI work that doesn't need you watching

After this, you'll be able to dispatch a background agent on a low-risk task, walk away, and come back to review the diff with calibrated trust in what the agent got right and what it missed.

Before you start

Before diving in, complete Harness Engineering 101 so the verification loops and observability infrastructure this lesson relies on are already in place.

The idea

Background agents became practical when model reliability crossed a threshold: reviewing a diff is cheaper than writing the code yourself. That is the shift Level 9 is on the other side of. You dispatch tasks, do your actual strategic work, and come back to review what finished. You did not write any of those diffs. Agents did, while you were offline.

A background task bead runs away from the review lane with no checkpoint behind it. — The starting state for Running Background Agents.

Use this model to move from the starting mistake to the lesson check.

	Before	After
Habit	Guess from a loose request	Use the lesson move
Work move	Skip Running Background Agents	Apply Running Background Agents
Check	No clear proof	Pass the lesson check

The after column is the lesson target.

Split work by model role from the start: one agent implements, a different one reviews. They do not share context. The review agent has no stake in defending the implementation agent's choices. That separation catches errors that self-review misses, and it costs less than running one expensive model for everything.

Here is the before and after: a team ran a capable model for implementation and a lighter model for review across 30 background tasks in one week. The review agent caught 11 issues the implementation agent missed (3 type errors, 4 missing null checks, 4 logic edge cases) at roughly 75% less cost than running the capable model for both roles. Separate context, separate stake, lower cost, better catch rate.

The hard problem at Level 9 is not running agents. It is coordination. Agent B starts from a codebase that Agent A has already modified. Stale context produces subtle bugs. The standard solution is branch-per-agent with merge gates: each agent works in isolation, and a merge only happens after validation passes. Branch isolation defers the coordination problem. It does not eliminate it.

Cost is a first-class concern here. Running five parallel agents on a capable model can clear hundreds of dollars in a day. Use cheaper models for lower-stakes tasks. Set per-run budgets. Monitor spend in real time. This is not optional hygiene. It is part of the architecture.

Try it (5 min)

Watch out for

Hovering over the agent's progress instead of walking away. Background agents only earn back time if you actually leave them alone for the full run.
Picking a high-stakes task for your first dispatch. Documentation, test generation, and dependency bumps are calibration tasks. Production refactors are not.
Letting the implementation agent review its own work. Self-review misses the errors a separate reviewer would catch. Always split implementation and review.
Skipping the cost cap. A loop that runs unbudgeted on a capable model can clear $50 before you notice. Set a per-run token budget on the first dispatch.
Treating one successful run as proof the task type is agent-suitable. Run three before you generalize. The first one might have been an easy case.

Paste this into Claude

I want to dispatch my first background agent run. Here is a low-risk task from my current project: [describe the task: e.g. 'update the README install section to match our new package name', 'generate unit tests for src/utils/dateFormat.ts', 'bump the lodash dependency from 4.17.20 to 4.17.21 and run the test suite']. Run this as a background agent. Open a PR with the diff when finished. While you work, I am stepping away for at least an hour. When I return, I want to review only the diff and the test output, not your intermediate steps. Include in the PR description: (1) what you changed, (2) which tests passed, (3) any decision you made that I should review specifically.

What good looks like

You dispatched a real background agent run on a low-risk task and walked away for at least an hour without checking in
You reviewed the resulting diff in under 10 minutes and can name exactly what the agent got right and what it missed
You can state whether reviewing the diff was faster than writing the code yourself, with a time comparison

When this breaks

Breaks when implementation and review agents share context because the reviewer inherits the same blind spots and rubber-stamps instead of catching errors.
Breaks when you dispatch agents on ambiguous tasks because the agent will produce confident-looking output that misses the unstated constraints, and the review step has nothing concrete to evaluate against.
Breaks when cost management is treated as cleanup rather than architecture because parallel agents on a capable model can burn hundreds of dollars before any after-the-fact dashboard surfaces the spike.

AI can help with this

Use AI to apply this lesson to your current work. Share your situation, ask for one concrete next step, and check the answer against this test: Dispatch one low-risk background agent task, walk away for an hour, and come back to review a diff that you can either merge as-is or reject with one specific actionable revision request.

The bead travels inside a bounded run lane and stops at a review gate before the golden dot.

You can now

Dispatch one low-risk background agent task, walk away for an hour, and come back to review a diff that you can either merge as-is or reject with one specific actionable revision request.

Key takeaways

Level 9 is when reviewing a diff is cheaper than writing the code. You dispatch, walk away, and come back to review what finished without you.

Background agents work when reviewing a diff is cheaper than writing the code yourself
Branch-per-agent isolation prevents stale context from corrupting parallel work
Separate implementation and review agents. They should not share context or defend each other's choices
Cost management is architecture at this level. Budget per run, monitor spend, route cheap tasks to smaller models

Was this helpful?

Up nextDesigning Autonomous Agent Teams

← Back to lessons

Before

After

Habit

Guess from a loose request

Use the lesson move

Work move

Skip Running Background Agents

Apply Running Background Agents

Check

No clear proof

Pass the lesson check

I want to dispatch my first background agent run. Here is a low-risk task from my current project: [describe the task: e.g. 'update the README install section to match our new package name', 'generate unit tests for src/utils/dateFormat.ts', 'bump the lodash dependency from 4.17.20 to 4.17.21 and run the test suite']. Run this as a background agent. Open a PR with the diff when finished. While you work, I am stepping away for at least an hour. When I return, I want to review only the diff and the test output, not your intermediate steps. Include in the PR description: (1) what you changed, (2) which tests passed, (3) any decision you made that I should review specifically.