Researchers introduced AXPO, a policy optimization method that fixes the Thinking-Acting Gap in vision-language models by improving tool utilization rates from ~30% to higher success rates through thinking prefix optimization and tool call resampling. For commerce practitioners building AI agents, this means more reliable autonomous tool use in product search, inventory queries, and customer service workflows without scaling model size.

Qwen

Themes

Articles

AXPO improves vision-language agent tool use and reasoning