Structured Output
Structured output is a feature that forces an LLM to return responses conforming to a specified schema — typically a JSON schema. Instead of hoping the model produces parseable JSON, the inference engine constrains token sampling so the output is guaranteed to validate.
Structured output is a feature that forces an LLM to return responses conforming to a specified schema — typically a JSON schema. Instead of hoping the model produces parseable JSON, the inference engine constrains token sampling so the output is guaranteed to validate.
Why It Matters
LLMs returning free-form text are hard to consume programmatically. Even when prompted "return JSON," models occasionally add prose, miss fields, or hallucinate types. This breaks downstream code and forces defensive parsing. Structured output solves the problem at the decoding layer — you get valid JSON 100% of the time, not 95%. OpenAI, Anthropic, Google, and open-source engines like vLLM and Outlines now support it natively, making it the default way to build reliable LLM pipelines.
How It Works
Constrained decoding: At each generation step, the model can only sample tokens that keep the output compatible with the schema. Tokens that would violate the schema are masked to probability zero.
Schema specification: You provide a JSON schema (or Pydantic model, Zod schema, TypeScript type) describing required fields, types, and enums.
Validation-free parsing: The caller can JSON.parse the result without try/catch around malformed output.
JSON Mode vs Structured Output
| Aspect | JSON Mode | Structured Output |
|---|---|---|
| Guarantee | Valid JSON syntax | Valid JSON matching your schema |
| Schema enforcement | None | Full |
| Field presence | Not guaranteed | Guaranteed |
| Hallucinated fields | Possible | Impossible |
| Latency overhead | ~0 | Small (constraint compilation) |
JSON mode only ensures the output parses. Structured output ensures it parses and matches the exact shape you need. For production systems, always use structured output when available.
When to Use It
Extracting data from text: Pulling names, dates, addresses from unstructured input.
Building agents that call tools: The tool call arguments must match the tool's parameter schema exactly.
Classifying into enums: Force the model to pick one of a fixed set of labels.
Generating multi-field responses: Titles, summaries, tags, scores in one pass.
Anywhere you currently regex-parse model output: That's a bug waiting to happen.
Trade-offs
Slight latency overhead: The decoder has to track the grammar state. Usually negligible.
Reduced creativity: Heavy schema constraints can make generation feel mechanical. For creative writing, prefer free-form.
Schema design matters: Overly strict schemas (required: all 20 fields) force the model to hallucinate values. Make optional what's genuinely optional.
Not all models support it: Older models and some open-source models still lack native support. Outlines and similar libraries can retrofit it.
Example
Schema:
{
"type": "object",
"properties": {
"title": { "type": "string" },
"tags": { "type": "array", "items": { "type": "string" } },
"sentiment": { "enum": ["positive", "negative", "neutral"] }
},
"required": ["title", "tags", "sentiment"]
}
Guaranteed output:
{ "title": "Launch recap", "tags": ["product", "Q2"], "sentiment": "positive" }
No parse errors. No missing fields. No invented enum values.
Sources: