Mixture of Experts (MoE)
Definition
Mixture of Experts (MoE) is a neural network architecture in which the model is composed of multiple specialized sub-networks ("experts"), with a learned routing mechanism that selects a small subset of experts to process each input token. Rather than activating the full model for every token, only a fraction of parameters are active at any given time, allowing the total parameter count to scale dramatically without a proportional increase in compute cost per inference.
MoE architecture underlies some of the most capable production LLMs (including GPT-4 and Mixtral), making it highly relevant for organizations evaluating frontier models. From a business perspective, MoE enables providers to offer models with very large effective parameter counts — and thus strong reasoning capabilities — at inference costs comparable to much smaller dense models. Understanding MoE is useful when assessing model benchmarks, cost projections, and the tradeoffs between different hosted model offerings for commerce AI applications.
Related Terms
Source
Last updated: May 12, 2026