Multimodal AI models scale toward unified vision-language systems
AXPO improves vision-language agent tool use and reasoning
LLM
Researchers introduced AXPO, a policy optimization method that fixes the Thinking-Acting Gap in vision-language models by improving tool utilization rates from ~30% to higher success rates through thinking prefix optimization and tool call resampling. For commerce practitioners building AI agents, this means more reliable autonomous tool use in product search, inventory queries, and customer service workflows without scaling model size.
View full article →