A note from building Zairoo's question generator: prompt tuning helped at first, but quality only improved once generation, verification, rubric scoring, and benchmarking became separate parts of the system.
Zairoo's question generator used to feel like a black box: click once, wait, then receive a final result. I changed the engine flow into events so the UI could show generation, verification, rubric checks, and failures as they happen.
Zairoo Beta looked like a simple free-access launch, but I did not want to implement it by rewriting subscription rows. Temporary product policy should be a runtime plan, not corrupted history.
This was not a clever bug. It was ordinary schema drift: the code believed a column existed, but production did not. Since then I check SQL, journal entries, snapshots, and the live database together.