Over six months, OpenAI and Thrive Holdings co-developed Tax AI to automate tax return preparation for Crete's network of accounting firms. The system processes source documents and client notes to generate 1040 and 1041 tax returns, saving practitioners roughly one-third of preparation time and achieving up to 97% accuracy. In a pilot across 7,000 returns this tax season, measurable self-improvement was evident: within six weeks, the share of returns reaching 75% correct field completion jumped from 25% to 86%, with continued gains as the system tackled increasingly complex filings like K-1s and rental property schedules.
The breakthrough is architectural. Tax AI captures the full production trace—from source documents through extraction, mapping, and practitioner correction—then uses Codex to investigate and fix recurring failures autonomously. When practitioners correct a field repeatedly, the system groups those corrections into actionable findings, creates targeted evaluation sets, and routes them to Codex as bounded engineering tasks. Codex inspects the trace, extraction logic, and tax-engine mapper to propose fixes, validates them against targeted and regression evals, and surfaces candidate pull requests for review. This replaces slow, manual post-mortems with continuous, evidence-driven iteration.
For commerce practitioners, this pattern is portable to any domain where high-volume, document-heavy workflows intersect with expert judgment—accounts payable, invoice processing, claims adjudication, or compliance. The key is designing the product to emit structured signals from production and empowering practitioners to steer what the system learns, rather than waiting for engineers to discover failures. As agentic capabilities mature, this closed-loop approach—production traces + practitioner feedback + Codex iteration—offers a scalable path to faster, safer product improvement without proportional engineering overhead.