What if the LLM consistently fails to produce valid output?

Our systems implement a retry loop with refined prompts, sometimes including few-shot examples or schema excerpts. If repeated failures, we escalate to a human for prompt tuning or fallback to a simpler model.

Does this add significant latency?

Yes, each retry adds latency. We monitor retry rates closely. High rates often signal a need to optimize the prompt, simplify the schema, or consider a more capable LLM for that specific task.

Concept · agents · in production

Structured Output via Zod

Structured output via Zod defines a TypeScript schema for LLM responses, ensuring valid data for downstream processing by validating and regenerating on mismatch.

Structured Output via Zod is the practice of defining a TypeScript schema with Zod, prompting an LLM to generate JSON adhering to that schema, and then programmatically validating the output, retrying the generation if it fails validation.

What it is

At its core, this approach leverages Zod, a TypeScript-first schema declaration and validation library, to create a rigid contract for the expected output from a Large Language Model. Instead of relying on the LLM's inherent ability to produce well-formed JSON, we explicitly define the shape of the data—including types, required fields, and even validation rules for string formats or numeric ranges. The LLM is then instructed, often with system prompts or few-shot examples, to generate JSON that conforms to this schema. Post-generation, the output is parsed and passed through the Zod validator. If the validation fails, indicating a deviation from the schema, the system automatically triggers a regeneration, potentially with an adjusted prompt that highlights the specific validation error. This retry loop continues until a valid output is produced or a predefined retry limit is reached.

Why it matters

For any system relying on LLM outputs as structured data inputs for subsequent processes, reliability is paramount. Unstructured or malformed LLM responses can break downstream parsers, lead to runtime errors, or introduce subtle data inconsistencies that are difficult to debug. By enforcing a strict schema, we transform the LLM from a probabilistic text generator into a more predictable data factory. This is particularly critical for building robust Agent Autonomy Tiers, where an agent's output might directly trigger API calls, database writes, or further complex computations. Without structured output, the reliability of such autonomous operations would be severely compromised, leading to frequent manual interventions or system failures. It allows us to treat LLM outputs as first-class data, consumable by TypeScript-based services running on platforms like Vercel or Firebase functions, without extensive custom parsing logic.

How TV applies it

Within the Total Ventures portfolio, structured output via Zod is a foundational pattern for almost all LLM-driven agents. For instance, the VERA Agent Daemon uses Zod schemas to define the expected JSON output for its nightly studio analysis. This ensures that VERA's insights—such as flagged issues, suggested optimizations, or data summaries—are consistently formatted and ready for ingestion by our internal dashboards and notification systems. When VERA needs to use a specific Tool Use Pattern, the inputs and outputs of that tool are often defined with Zod, ensuring the agent interacts with external systems predictably. Similarly, our content generation pipelines for various portfolio companies leverage Zod to define the structure of articles, social media posts, or email drafts, including fields for title, body, tags, and calls-to-action. This guarantees that the generated content is immediately usable and integrates seamlessly with platforms like Resend for email delivery or our CMS for publishing.

Common failure modes

While highly effective, this approach isn't without its challenges. The most common failure mode is the LLM's inability to consistently adhere to complex or deeply nested schemas, especially with less capable models like older GPT-3.5 iterations or some open-source alternatives. This often results in excessive retries, increasing prompt token costs and latency. Another issue arises from 'schema drift'—when the underlying data model changes, but the LLM's understanding (or the prompt) isn't updated, leading to persistent validation failures. Furthermore, overly restrictive schemas can sometimes stifle the LLM's creativity or ability to handle edge cases, forcing it into a box that doesn't quite fit the nuanced reality of the data. Finally, the retry mechanism itself, while essential, can mask underlying prompt engineering issues or indicate that the chosen LLM is simply not suitable for the complexity of the task, leading to suboptimal performance and higher operational costs.

FAQs

What if the LLM consistently fails to produce valid output?: Our systems implement a retry loop with refined prompts, sometimes including few-shot examples or schema excerpts. If repeated failures, we escalate to a human for prompt tuning or fallback to a simpler model.
Does this add significant latency?: Yes, each retry adds latency. We monitor retry rates closely. High rates often signal a need to optimize the prompt, simplify the schema, or consider a more capable LLM for that specific task.

Want to see this pushed into production?

See the experiments →