After this, you'll be able to wire a feedback loop where the agent runs tests, reads the result, and fixes failures without you intervening at every step.
Before you start
Before diving in, complete Track Cost Per Iteration so your inner loop is cost-aware from the start and won't burn budget on every test-fix cycle.
The idea
You ask Claude to add a small feature. It writes the code and says 'done.' You run the tests. Two failures. You paste the error back. It fixes one. You run again. Another failure. Forty-five minutes later you are still in the loop, manually ferrying outputs between Claude and your test runner. That back-and-forth is the supervision bottleneck the inner loop eliminates.
At L8, the agent does not just act. It verifies whether the action worked, reads the result, and acts again. You ask for the feature. The agent writes the code, runs the test suite itself, reads the two failures, fixes the root cause, and re-runs until green. Then it reports back. You were not in any of those intermediate steps.
Here is the before and after: a task that takes 6 hours of supervised back-and-forth becomes 90 minutes of unsupervised agent work plus 15 minutes of your review on the final diff. That is a 4x speedup from one change: letting the agent close the loop.
The prerequisite is a runnable check. The agent can only loop if it can verify it succeeded. Your project needs a test suite, a linter, or a build step it can invoke and read. If you have none, add one before building the inner loop. Without a checkable exit condition, the loop has no way to know when to stop.
Try it (20 min)
Watch out for
Paste this into Claude:
I want to close the feedback loop on a task in my project. Here is a repeatable task I do regularly: [describe a task like 'add a function that does X and make sure tests pass' or 'fix the failing linter rule and verify no new errors']. Help me set this up so you can run it end-to-end: (1) Write or modify the code needed, (2) Run the test suite or linter using the shell tool, (3) Read the output, (4) Fix any failures, (5) Re-run until clean. Do not ask me to confirm each step. Show me the full loop running to completion.
What good looks like:
What a good response looks like:
Here is what a completed inner loop run looks like in Claude's summary: 'Task: add input validation to the createUser function. Loop summary: - Cycle 1: Wrote validation logic. Ran npm test. 2 failures: createUser.test.ts lines 34 and 67 expected ValidationError but got undefined. - Cycle 2: Fixed root cause: validation was applied after the database write, not before. Moved the check. Ran npm test. 1 failure: line 67 still failing (edge case: empty string not caught by the null check). - Cycle 3: Added empty-string guard. Ran npm test. All 47 tests passing. Changes made: - src/users/createUser.ts: added validateInput() call before db.insert() (line 12) - src/users/validators.ts: added isNonEmptyString() guard (lines 8-11) Final state: npm test exits 0. No linter errors. Ready to review.' What you should see: the agent counted its own cycles (3), named the specific failing lines, explained the root cause it found, and reported a clean final state. You review one diff, not six exchanges.
Go deeper (30 min)
Paste this into Claude:
Now measure the loop. Pick a task that took you 30 minutes or more in the past when you were doing it manually with back-and-forth prompting. Run the same task using the inner loop. Record: (1) How many minutes did the agent run unsupervised? (2) How many test or lint cycles did it take? (3) How many minutes did your final review take? Compare those numbers to your manual estimate. Paste the numbers and ask Claude to help you identify which step consumed the most time and whether the loop could be tightened.
What good looks like:
What a good response looks like:
Loop measurement report: Task: refactor data fetching layer to use a shared HTTP client (previously took ~45 min with back-and-forth) Inner loop run: - Unsupervised agent time: 22 minutes - Fix-and-recheck cycles: 4 - Your review time: 11 minutes - Total: 33 minutes (vs 45 manual) Slowest step: cycle 3 (9 minutes), agent rewrote the retry logic from scratch instead of patching it. Faster alternative: add a spec criterion 'do not rewrite retry logic, only update the client instantiation' to prevent it. Net improvement: 27% faster, with less cognitive load (you reviewed a diff, not a conversation).
When this breaks
Claude can do it for you
Say to Claude: 'I want you to run this task end-to-end: [task description]. After each step, run [test command or linter] and read the output. Fix any failures and re-run. Do not stop until the check passes. Then show me a summary of what you changed and the final test result.'
You can now
Run a real task end-to-end where the agent edits code, runs tests, fixes failures, and reports a clean final result without your intermediate input.
Key takeaways
Wire the loop once, then get out of the way. Act, check, fix, repeat. You review the outcome, not the process.