Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L10Lesson 5

Document the Frontier

After this, you'll be able to write a structured failure or success report from one of your autonomous team runs and explain why documentation at this level is a contribution, not just a record.

Before you start

Complete When NOT to Deploy a Team first; this lesson builds on the deployment decision process so your report captures what the four-question test revealed, not just what the agents did.

The idea

Here is the before and after: You hit a coordination failure last week that cost $60 and produced inconsistent output. You have a hypothesis about why it happened. You tried two mitigations, one of which partially worked. You have not seen anyone else write about this exact failure mode. That observation, written up in 15 minutes, is worth more to the community than another tutorial on how to prompt an agent.

Level 10 is where the map is still being drawn. The failure modes are barely documented. The tooling is experimental. Most practitioners who have hit interesting failure modes have not written them up, not because the knowledge does not exist, but because they are busy building.

Documenting what you learn at this level is not optional. It is the contribution.

A useful frontier write-up has five parts. Context: what you were trying to do and what agents were involved. Observation: what happened that you did not expect, with specific evidence (numbers, log lines, diffs). Hypothesis: your best explanation. Mitigation: what you changed and whether it worked, or 'not yet mitigated' if it is still open. Open questions: what you still do not understand.

The open questions section is as important as the resolution. Shared uncertainty moves the field forward faster than shared answers.

You do not need to have solved the problem to write it up. A clear account of an unresolved failure with your best hypothesis and what you tried is genuinely useful to the small number of people at this level who are likely to hit the same wall.

Try it (15 min)

Watch out for

  • Waiting until you have a complete solution before writing. The most useful frontier documents include unresolved problems. Write up what you know now.
  • Writing a tutorial instead of a failure report. Tutorials describe ideal paths. Frontier documentation describes what actually happened, including what did not work.
  • Omitting the numbers. A failure report without cost figures, iteration counts, or time elapsed is harder to use. Be specific about what you measured.
  • Treating documentation as a solo activity. Share what you write. Post it, open a GitHub issue, or send it to someone else working at this level. Documentation no one reads is not a contribution.
  • Skipping the open questions section because it feels like admitting failure. Open questions are the signal that you are on the frontier, not a weakness in the report.

Paste this into Claude:

I want to write up one thing I have observed in my autonomous agent work. The observation is: [describe one thing that worked, broke, or surprised you in a multi-agent run, e.g. 'two agents both modified the same config file and the merge produced invalid JSON', or 'a feedback loop between my generator and reviewer agents doubled my token cost before hitting the cap', or 'I decided not to deploy a team for X because of Y']. Help me write a structured report with five sections: (1) Context: what I was trying to do and what agents were involved. (2) Observation: what happened that I did not expect, with specific evidence (log lines, cost numbers, output diffs if available). (3) Hypothesis: my best explanation of why it happened. (4) Mitigation: what I changed and whether it worked, or 'not yet mitigated' if still open. (5) Open questions: what I still do not understand about this failure or pattern. Write the report in plain language, under 400 words, as if sharing with another practitioner at this level.

What good looks like:

  • All five sections are present and specific, not generic
  • The observation section includes at least one concrete piece of evidence: a number, a log excerpt, a diff, or a specific error
  • The hypothesis is your actual best explanation, not a disclaimer that 'many things could have caused this'
  • The open questions section contains at least one question you genuinely cannot answer yet
  • The report is under 400 words and readable by another practitioner without additional context

What a good response looks like:

Frontier write-up: feedback loop between generator and reviewer agents

Context: 3-agent pipeline. Generator creates documentation stubs. Reviewer checks completeness and flags gaps. Summarizer produces a final output. Run on 40 API endpoints.

Observation: reviewer called generator.addContent() 11 times in a single run before the circuit breaker fired. Total cost: $58. Expected cost: ~$8. The final output had 200 stubs, none marked complete. The summarizer never ran.

Hypothesis: reviewer's 'completeness' threshold was defined as 'all required sections filled.' Generator was adding content that satisfied individual section checks but the reviewer was rechecking the full document on each call, not just the new sections. Each pass found something else missing because the threshold was applied globally, not incrementally. This created a ratchet: the more content was added, the more checks were triggered.

Mitigation: changed reviewer to track which sections it has already marked complete and skip those on subsequent passes. This reduced calls from 11 to 2 on a re-run of the same 40 endpoints. Cost: $9.40.

Open questions: (1) Is there a principled way to define 'completeness' for generative content that does not require full-document re-evaluation? (2) The 2-call result feels better but I do not know if it is because the threshold is now correct or because the reviewer is missing real gaps. I do not have ground truth for what 'complete' should look like on these endpoints. (3) Would a different reviewer model (Haiku instead of Sonnet) have broken the loop earlier, or just produced lower-quality completeness checks?

When this breaks

  • Breaks when practitioners wait for a complete solution before writing because the most useful frontier reports document unresolved problems, and the people most likely to hit the same wall need the partial map now, not the polished one later.
  • Breaks when reports are written as tutorials instead of failure accounts because tutorials describe ideal paths the author never actually walked, while frontier readers need the false starts and the dead ends to recognize the same wall when they reach it.
  • Breaks when documentation stays private because a report no one reads is not a contribution, and at Level 10 the field moves at the pace of shared evidence, not solo notebooks.

Claude can do it for you

Say to Claude: 'Help me write a structured failure report for this observation: [describe it]. Use five sections: context, observation (with specific evidence), hypothesis, mitigation, and open questions. Keep it under 400 words, plain language, no jargon. I want something I can share with other practitioners.'

You can now

Produce a five-section report (context, observation with concrete evidence, hypothesis, mitigation status, open questions) under 400 words about one observed multi-agent failure or surprising success, written so another L10 practitioner can use it without additional context.

Key takeaways

The frontier gets mapped by the people working on it. Write up what you find, including what you have not solved. That is the contribution Level 10 needs most right now.

  • Level 10 documentation is a contribution, not a record. The map is still being drawn and your observations expand it
  • Use the five-section structure: context, observation with evidence, hypothesis, mitigation status, open questions
  • Open questions matter as much as resolutions. Shared uncertainty moves the field faster than polished answers
  • Numbers belong in the observation: cost, iteration counts, time elapsed. A frontier report without measurements is harder to use
  • Share what you write. A report no one reads is not a contribution

Go deeper

  • Claude Code Sub-Agents (official documentation, start here)
  • Ahead of AI Newsletter (AI research signal)
  • Interconnects Newsletter (AI research signal)