Microsoft SkillOpt optimizes agent skills via text-space training

Microsoft researchers published SkillOpt, a systematic text-space optimizer that trains agent skills as external frozen-model state using controlled edit loops, achieving +23.5 point accuracy gains on GPT-5.5 across six benchmarks with zero inference overhead. Commerce teams deploying agentic workflows gain a reproducible skill-tuning method that transfers across model scales and execution environments without retraining.

Microsoft introduced SkillOpt on May 22, a novel framework that treats agent skills as trainable external state rather than hand-crafted or one-shot generated prompts. The system uses a separate optimizer model to turn scored rollouts into bounded add/delete/replace edits on skill documents, accepting only changes that improve held-out validation scores. Across six benchmarks and seven target models (including GPT-5.5, Codex, and Claude Code), SkillOpt achieved best-or-tied performance on all 52 evaluated configurations, lifting average accuracy by +23.5 points in direct chat and +24.8 points inside agentic loops.

For commerce practitioners, SkillOpt addresses a critical pain point: agent skill optimization has historically relied on manual crafting or unstable self-revision loops. By applying weight-space optimization discipline to text-space skill training—with a textual learning-rate budget, rejected-edit buffers, and epoch-wise meta-updates—the framework delivers reproducible skill improvement without adding inference-time overhead at deployment. This matters for e-commerce agents handling customer service, product recommendations, and transaction workflows where skill reliability directly impacts conversion and retention.

The transferability findings strengthen the business case: optimized skills retain value when moved across model scales, between execution environments (Codex to Claude Code), and to nearby benchmarks without further optimization. This suggests commerce teams can optimize skills once and redeploy across multiple LLM backends and agent architectures, reducing the cost of multi-model strategies.

Huggingface