LoRA (Low-Rank Adaptation)
Definition
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adapts a pre-trained large language model by injecting small, trainable rank-decomposition matrices into the model's existing weight layers, rather than updating all parameters. Instead of fine-tuning billions of weights, LoRA trains only the low-rank matrices — typically representing less than 1% of total parameters — and keeps the original model frozen. The trained adapter can be merged back into the base model or swapped at inference time.
LoRA has become a practical standard for enterprise AI customization because it dramatically reduces the GPU memory and compute required for fine-tuning. A commerce team can create domain-specific adapters for tasks like product taxonomy classification, brand-voice content generation, or customer intent detection — without provisioning the infrastructure required to fine-tune a full model. Multiple LoRA adapters can be maintained for different use cases and applied selectively, enabling flexible, cost-effective model specialization at scale.
Related Terms
Source
Last updated: May 12, 2026