Skip to content
Agentic Levels
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • What's New
  • What's Next
  • More
    Tool SetupCompareAboutThanksFAQPricingPreferences
  • New to AI?
  • Assessment
  • Levels
  • Lessons
  • Tracks
  • Resources
  • Reference
  • Tool Setup
  • Compare
  • What's New
  • About
  • Thanks
  • FAQ
  • What's Next
  • Pricing

© 2026 Fuentes Studio·Privacy·Terms

yourCouncil
Ready to help
✦

What do you want to understand?

Ask anything about what you're learning.

L5Lesson 2

Structured Output as Contract

After this, you'll be able to enforce structured output from any prompt that feeds a downstream system, using the right tool for the job: JSON mode, XML tags, or schema validation.

Before you start

Complete Build Your First RAG Pipeline first; this lesson builds on retrieval so the chunks your pipeline returns have a contract the next step can reliably parse.

The idea

Your AI just produced a great recommendation as flowing prose. Now your database write is choking on it. The downstream system expected a JSON object with a `category` field and a `priority` integer. It got three paragraphs. The output was correct. The shape was wrong.

The moment your AI output feeds any downstream process, free-form text becomes a liability. A database write, a UI render, a second model call, a webhook: all of them need predictable shapes. Without a contract, every consumer writes its own fragile parser. With one, the output is the API.

There are three tools and they are not interchangeable. JSON mode (available in OpenAI and most providers) enforces valid JSON syntax but does not enforce field names or types. Use it for simple key-value outputs when you control the consumer. XML tags are the right choice for multi-part responses where field order matters or where the response mixes structured data with free text. Use tags like `<summary>`, `<action>`, `<confidence>` to carve the response into labeled sections your code can parse with a simple regex or XML parser. Schema validation (OpenAI structured outputs, Anthropic tool use with JSON schemas) is the strict option: it enforces field names, types, and required fields at the API level, rejecting responses that do not conform.

Here is the before and after: a team built an agent that triaged customer support tickets. The agent output a JSON object with three fields: `category` (one of five enum values), `priority` (1-5 integer), and `summary` (string under 200 characters). That object wrote directly to their database. Before schema enforcement, the agent occasionally returned `'priority': 'high'` instead of `'priority': 3`, and the database write crashed silently. After adding a strict schema via Anthropic tool use, invalid outputs were caught at the API level before they ever reached the database. The schema validation step caught 3 type errors per 100 calls that would have crashed the write silently.

Try it (20 min)

Watch out for

  • Using JSON mode when you need field-type enforcement. JSON mode guarantees syntax, not schema. `{'priority': 'high'}` is valid JSON.
  • Putting the schema only in the system prompt and hoping the model remembers it. Use the provider's native schema enforcement (tool use, response_format) when strict types matter.
  • Mixing structured output fields with long free-text reasoning in the same JSON object. Put reasoning in a dedicated field or outside the JSON block.
  • Forgetting to validate the output before passing it downstream. Even strict schema enforcement can produce semantically wrong values within the allowed types.
  • Designing schemas with more than 8-10 fields. Complexity degrades compliance. If you need more, split into nested objects or multiple calls.

Paste this into Claude:

I have a prompt that currently produces free-form text but feeds a downstream process. Here is the prompt: [paste your prompt]. Here is what the downstream process needs: [describe the fields, types, and constraints: e.g., a JSON object with a 'category' field (one of: bug, feature, question), a 'priority' field (integer 1-5), and a 'summary' field (string under 150 chars)]. Help me: (1) choose the right structured output method for this use case (JSON mode, XML tags, or schema validation), (2) rewrite the prompt to enforce that structure, and (3) show me what a valid output looks like and what an invalid output looks like so I can test it.

What good looks like:

  • You chose a structured output method and can explain why it fits your use case better than the alternatives
  • Your rewritten prompt consistently produces output in the correct structure across 5 test runs
  • You confirmed that a valid output parses correctly into your downstream consumer (database, function, or next prompt)
  • You tested at least one edge case where the model might want to deviate from the schema and verified it did not
  • You understand the difference between JSON mode (syntax only) and schema validation (type + field enforcement)

What a good response looks like:

For your support ticket triage prompt, here is my analysis.

Recommended method: schema validation via Anthropic tool use. Your downstream process is a database write with strict column types. JSON mode would catch `{'priority': 'high'}` only at parse time (after the write fails). Schema validation catches it at the API level before your code ever sees the response.

Here is the schema definition:

```json
{
  "name": "triage_ticket",
  "input_schema": {
    "type": "object",
    "properties": {
      "category": { "type": "string", "enum": ["bug", "feature", "billing", "access", "other"] },
      "priority": { "type": "integer", "minimum": 1, "maximum": 5 },
      "summary": { "type": "string", "maxLength": 150 }
    },
    "required": ["category", "priority", "summary"]
  }
}
```

Valid output example:
```json
{ "category": "bug", "priority": 3, "summary": "User cannot log in after password reset. Token expires immediately on first use." }
```

Invalid output the schema would reject (and that JSON mode would pass):
```json
{ "category": "bug", "priority": "high", "summary": "Login issue" }
```

The `priority: 'high'` would fail the `integer` type check. The API returns a validation error instead of a malformed response. Your database write never runs.

Go deeper (15 min)

Paste this into Claude:

Pick one prompt from your current workflow that produces free-form text. It does not have to feed a database. It just needs to have predictable parts: maybe it always outputs a recommendation, a reason, and a confidence level, even if those are buried in prose right now. Add XML tags to that prompt to carve out the structure explicitly: for example, wrap the recommendation in `<recommendation>`, the reason in `<reason>`, and the confidence in `<confidence score='1-10'>`. Run it five times and confirm the tags are always present and parseable.

What good looks like:

  • You identified a prompt that has implicit structure buried in free text
  • You added XML tags and the model populates them consistently across 5 runs
  • You wrote a two-line parser (regex or XML lib) that extracts all tagged fields
  • The extracted fields are always present, never empty, and match the expected type

What a good response looks like:

Your vendor evaluation prompt already has implicit structure. Every response you get contains a recommendation, a list of reasons, and a confidence signal buried in phrases like 'I am fairly confident' or 'this is a strong fit.' Here is the XML-tagged version.

Original prompt ending: '...evaluate this vendor and give me your recommendation.'

Revised prompt ending: '...evaluate this vendor and structure your response as follows:
<recommendation>APPROVE or REJECT or DEFER</recommendation>
<reasons>
  <reason>One specific reason per tag</reason>
</reasons>
<confidence score="1-10">Integer only</confidence>'

Sample output after 5 runs (consistent):
```xml
<recommendation>APPROVE</recommendation>
<reasons>
  <reason>SOC 2 Type II certified, audit report available</reason>
  <reason>EU data residency supported, no cross-border transfer required</reason>
  <reason>SLA guarantees 99.9% uptime with financial penalty clause</reason>
</reasons>
<confidence score="8">8</confidence>
```

Two-line parser:
```python
import re
recommendation = re.search(r'<recommendation>(.*?)</recommendation>', output).group(1)
confidence = int(re.search(r'<confidence score="(\d+)">', output).group(1))
```

All 5 runs: tags present, confidence always an integer 1-10, recommendation always one of the three allowed values.

When this breaks

  • Breaks when the schema lives only in prompt instructions because the model drifts on rare inputs and there is no API-level rejection path to catch it before the downstream consumer.
  • Breaks when the schema mixes free-text reasoning and strict typed fields in one object because the reasoning section bloats unpredictably and pushes other fields toward truncation.
  • Breaks when schemas grow past 8-10 fields because field-compliance rate degrades non-linearly with object complexity, and the model starts dropping or fabricating fields silently.

Claude can do it for you

Paste your prompt to Claude and say: 'This prompt feeds a downstream process. Design a structured output schema for it, choose the right enforcement method, and rewrite the prompt to use it. Show me a valid output and an invalid output so I can write a test.' It will design the contract for you.

You can now

Produce the same structured shape across 5 consecutive runs of a prompt that previously returned free text, and parse all 5 outputs cleanly without manual cleanup.

Key takeaways

Structured output is a contract between the model and the consumer. JSON mode for syntax, XML tags for multi-part prose, schema validation for strict types. Choose deliberately.

  • Structured output is a contract. Once your output feeds a system, free text is a liability
  • JSON mode enforces syntax only. Schema validation enforces types and required fields. They are not interchangeable
  • XML tags are the right tool when responses mix structured data with free-text reasoning
  • Provider-native enforcement (tool use, response_format) beats schema-in-prompt every time for strict typing
  • Keep schemas under 10 fields. Compliance degrades with complexity. Split into nested objects or multi-call flows

Go deeper

  • OpenAI: Structured Outputs
  • Anthropic: Tool Use (Structured Output via Tools)
  • 12-Factor Agents: structured output as the contract