Skip to main content
AI Best Practicesfor Commerce
Value ChainsUse CasesCase StudiesOrg ChartAI ToolsNewsAI OverviewImplementation & AdoptionTechnology OverviewGlossaryAbout McFadyen Digital
McFadyen Digital

Authoritative AI Best Practices for Commerce

Explore

Value ChainsUse CasesAI OverviewImplementationTechnology

Resources

AI ToolsNewsGlossaryAbout UsContact Us
|||Sitemap||

© 2026 McFadyen Digital. All rights reserved.

We use analytics to understand how visitors use this site and improve the experience. No personal data is shared with third parties.

AXPO improves vision-language agent tool use and reasoning | AI Best Practices — McFadyen Digital | AI Best Practices for Commerce
  1. News
  2. › Multimodal AI models scale toward unified vision-language systems
  3. › May 28, 2026
Multimodal AI models scale toward unified vision-language systemsThursday, May 28, 2026
LLMAlibabaQwenAXPO · qwenQwen3-VL-Thinking · qwen

AXPO improves vision-language agent tool use and reasoning

Researchers introduced AXPO, a policy optimization method that fixes the Thinking-Acting Gap in vision-language models by improving tool utilization rates from ~30% to higher success rates through thinking prefix optimization and tool call resampling. For commerce practitioners building AI agents, this means more reliable autonomous tool use in product search, inventory queries, and customer service workflows without scaling model size.

Researchers published AXPO (Agent eXplorative Policy Optimization), a training method designed to address a critical weakness in agentic reasoning systems: the Thinking-Acting Gap. Vision-language models with extended reasoning often fail to reliably use external tools, with standard RL approaches like GRPO achieving tool use on only ~30% of rollouts and generating all-wrong responses on ~40% of tool-using attempts. AXPO fixes this by locking the thinking prefix and resampling tool calls paired with uncertainty-based prefix selection, yielding +1.8pp improvements in Pass@1 and Pass@4 metrics across nine multimodal benchmarks.

For commerce practitioners, AXPO represents a practical path to more capable AI agents without expensive model scaling. An 8B-parameter model trained with SFT+AXPO outperforms a 32B base model on Pass@4 metrics, meaning commerce teams can deploy faster, cheaper agents for product discovery, order fulfillment queries, and customer support automation. The method directly addresses the reliability gap that has limited agentic tool use in production e-commerce systems.

The paper demonstrates results across Qwen3-VL-Thinking at multiple scales, signaling broader applicability. Commerce organizations experimenting with vision-language agents should monitor whether AXPO or similar exploration-based policy methods become standard in commercial model fine-tuning, as they could unlock more autonomous and cost-efficient shopping experiences.

Sources:1 report
  • Huggingface
‹ Newer storyCisco and OpenAI deploy Codex as enterprise AI engineering teammateOlder story ›NEO-ov native vision-language model unifies pixel-to-word learning at scale

More from May 28, 2026

  • OpenAI deploys election safeguards for 2026 global voting cycles
  • Anthropic appoints KiYoung Choi as Korea Representative Director
  • NVIDIA Gamma-World scales multi-agent video generation to four players.
  • Anthropic co-founder Olah addresses Pope on AI ethics
  • NVIDIA Blackwell sets STAC-AI LLM inference record in finance.

More on Multimodal AI models scale toward unified vision-language systems

  • MAY 28, 2026NEO-ov native vision-language model unifies pixel-to-word learning at scale
ShareLast updated: May 28, 2026