Generate an agentic structured output

One of the biggest pain of traditional web-developer like us still heavily rely on structured output such as JSON format to integrate with our existing system. Regardless of SQL or NoSQL, you still need some form of controllable output in order to make sure your application is able to integrate seamlessly without breaking the application.

Blocker

Generative output is like trying to tame a wild beast. For example, mixing JSON output with paragraph text as a response output. Besides that, you might also faces hallucinating the field name and type that you need.

I was attempting at trying to generate a complex output with nested object structure based on the summary of context and thinking response. Here is my example:


    {
        "company_id": "{{ company_id (as int) }}",
        "user_id": "{{ user_id (as int) }}",
        "title": "{{ Give a nice title to represent the content (as str) }}",
        "details": "{{ Summary of the consolidated items (as str) }}",
        "items": [
            {
                "product_id": "{{ product_id (as int) }}",
                "company_id": "{{ company_id (as int) }}",
                "title": "{{ Title of item (as str) }}",
                "summary": "{{ Summary of this offer (as str) }}",
                "offer_details": {
                    "pricing": [{{ Any pricing details (as str) }}],
                    "title": "{{ offer title (as str) }}",
                    "total": "{{ amount the calculated price (as float) }}"
                }
            }
        ]
    }

It also depends on the type of model you’re using. Assuming you’re the type who change LLM modal from time to time to take advantage of cheap or newer modal, you need to rigorously test in order to ensure there is no regression.

Options for structured output

One shot or few shot

A common prompt engineering strategy is to apply some output example of what you want to achieve. Theoretically this seems like a good approach, until you start trying to build real application with it. The few shot still have the tendency hallucinate. Especially true, if you’re using on the cheaper modal (PS: using nova-lite to save cost). If you’re building a simple JSON output, this approach may solve without over-complicating your solution.

If you’re using n8n’s structured output parser. This is what I guess is happening under the hood.

In my case, I often encounter fields that I didn’t ask for it, like outputting items as line_items, etc.

Agentic feedback

This approach uses another agent to provide a feedback loop to validate your output. The so called “reinforce learning” tactic. If your goal is to build a simple AI parser, this seems like a bit of over-engineering to me. The feedback loops work great until it start going into infinite loop when it fails to get the information it need.

Tuning temperature and top-p

Temperature and top-p are two important parameters that control the randomness and diversity of text generated by LLM by controlling the output randomness. Combining these two give you the ability to fine-tuned creativity and coherence. Example:

Low temperature, high top-p: Produces coherent text with a bit of creative flair.
High temperature, low top-p: Generates more surprising and unusual text.
High temperature, high top-p: Ideal for creative writing and storytelling.

When you’re trying to force the LLM to be creative in summarizing while generating a structured output. There is a lot of parameter balancing required.

Defining schema

And finally, a breakthrough that open up the door. Simple, yet real. Because, I am switching from the n8n prototype to a more robust and production ready agent. I am opting with Strands SDK, and this let me to uncover more about structured output.

Under the hood, you build your schema using Pydantic Models. This gives you the option to build complicated nesting modal without sacrificing creative generation. You can even provide example to each of the specific field. Which leads to a combination of all the above I mentioned previously.

The assert nature of validating the modal, helps the agent by providing qualitative feedback for the agent to course correct the generated output. Making it a consistently reliable output.

Conclusion

Building AI solution is still a relatively new domain yet to have a clearly defined structure. While today, I am relying on schema definition, that doesn’t guarantee this is the best approach in the future as well. As much as I want to make it fool-proof, applying the traditional engineering approach still work the best based on today standard.

While n8n is a simple and easy way to build prototype. Once the requirement grew, the reliable approach is always has to go back to the proper engineering work to make it right.

choong pw

eat to survive, code to dream