Enterprise AI agents automate specialized business workflowsThursday, May 28, 2026

LLMCreteOpenAIThrive HoldingsCodexTax AI

Thrive Holdings and OpenAI deploy self-improving Codex tax agent

OpenAI and Thrive Holdings built Tax AI, a Codex-powered agent that automates tax return preparation for Crete's 30+ accounting firms and measurably improves itself by converting practitioner corrections into structured signals for autonomous iteration. Commerce teams operating complex, document-heavy workflows can adopt this three-pillar pattern—practitioner feedback, production traces, and Codex-driven loops—to reduce manual iteration cycles and ship product improvements faster than traditional engineering workflows allow.

Over six months, OpenAI and Thrive Holdings co-developed Tax AI to automate tax return preparation for Crete's network of accounting firms. The system processes source documents and client notes to generate 1040 and 1041 tax returns, saving practitioners roughly one-third of preparation time and achieving up to 97% accuracy. In a pilot across 7,000 returns this tax season, measurable self-improvement was evident: within six weeks, the share of returns reaching 75% correct field completion jumped from 25% to 86%, with continued gains as the system tackled increasingly complex filings like K-1s and rental property schedules.

The breakthrough is architectural. Tax AI captures the full production trace—from source documents through extraction, mapping, and practitioner correction—then uses Codex to investigate and fix recurring failures autonomously. When practitioners correct a field repeatedly, the system groups those corrections into actionable findings, creates targeted evaluation sets, and routes them to Codex as bounded engineering tasks. Codex inspects the trace, extraction logic, and tax-engine mapper to propose fixes, validates them against targeted and regression evals, and surfaces candidate pull requests for review. This replaces slow, manual post-mortems with continuous, evidence-driven iteration.

For commerce practitioners, this pattern is portable to any domain where high-volume, document-heavy workflows intersect with expert judgment—accounts payable, invoice processing, claims adjudication, or compliance. The key is designing the product to emit structured signals from production and empowering practitioners to steer what the system learns, rather than waiting for engineers to discover failures. As agentic capabilities mature, this closed-loop approach—production traces + practitioner feedback + Codex iteration—offers a scalable path to faster, safer product improvement without proportional engineering overhead.

Open AI news