NVIDIA released Cosmos 3, an open-source foundation model for physical AI that unifies reasoning and generation tasks in a single Mixture-of-Transformers architecture. The release includes two model checkpoints (Cosmos 3 Nano at 8B parameters for edge inference, Cosmos 3 Super at 32B for datacenter deployment), six synthetic datasets covering robotics, autonomous driving, warehouse operations, and physics simulation, open post-training scripts, and Cosmos NIM microservices for GPU deployment. The model supports multimodal inputs (text, image, video, audio, action) and outputs, enabling applications from robotic manipulation to autonomous vehicle prediction.
For commerce practitioners, Cosmos 3 eliminates the need to orchestrate multiple specialized models for physical understanding and generation—a significant operational simplification for warehouse automation, last-mile delivery, and supply-chain robotics. The open-source datasets and post-training scripts allow teams to adapt the model to domain-specific scenarios without starting from scratch, reducing time-to-deployment for physical AI applications. NVIDIA's Human Evaluation benchmark (HUE) provides fine-grained quality verification across semantic alignment, physical laws, and geometric reasoning, giving practitioners objective metrics for production readiness.
Cosmos 3 currently leads public benchmarks including R-Bench, PAI-Bench, and Physics-IQ, and ranks as top open-source model on Artificial Analysis for text-to-image and image-to-video tasks. The unified architecture and open licensing position it as a credible alternative to proprietary physical AI platforms, particularly for organizations building supply-chain automation, autonomous warehouse systems, and robotic fulfillment infrastructure.