Reinforcement Learning
Definition
Reinforcement learning (RL) is a machine learning paradigm in which an agent learns to make decisions by interacting with an environment, receiving scalar reward signals for its actions, and optimizing its behavior policy to maximize cumulative reward over time. Unlike supervised learning (which requires labeled input-output pairs), RL learns from the consequences of actions taken in a dynamic environment, making it well-suited for sequential decision-making problems.
In commerce, reinforcement learning is applied to problems where the optimal action depends on context and has delayed consequences — such as dynamic pricing, bid optimization in advertising platforms, personalized promotion sequencing, and inventory replenishment. RL agents can learn strategies that outperform rule-based systems by adapting to complex, nonstationary environments. The key challenges in production RL deployments are reward function design (misspecified rewards produce unexpected behaviors), sample efficiency (RL often requires many interactions to learn), and safe exploration (the agent must not take harmful actions while learning).
Related Terms
Source
Last updated: May 12, 2026