The first version of the generation API was straightforward: the frontend sent a request, the backend ran the engine, and the API returned one final JSON response.
That is fine for a demo. It is not great for a product.
Question generation can take tens of seconds. If the page only shows a spinner, the user cannot tell whether the system is still working. During development, it is also hard to see where a failed run actually broke.
The Zairoo generation flow has several steps:
- generate candidate questions
- parse structured output
- apply deterministic cleanup
- run verification
- run rubric scoring
- render diagrams when needed
When all of that is compressed into one response, the UI sees a black box. The logs are not much better.
I changed the engine output into a stream of events. Not because I wanted a fancier loading state, but because the process needed names.
The events are simple:
run_startedstage_startedtext_deltapartial_outputstage_completedwarningrun_failedrun_completed
Each event can carry the current stage, such as generate, verify, rubric, or image_render. The frontend no longer has to guess. It can update the interface directly from the engine’s progress.
The main benefit is that the run becomes explainable.
Users can see the system moving. Developers can see whether the model returned bad structure, verification failed, the rubric rejected some questions, or image rendering broke.
I increasingly think agent products should not only design the final answer. The middle part matters just as much: how the system moves, how it fails, and how it gives people confidence that it is still doing real work.