Prompt Injection
Definition
Prompt injection is a class of adversarial attack against large language model systems in which malicious instructions are embedded in user-supplied input or external content retrieved by the model, with the intent of overriding the system's original instructions, bypassing safety measures, or causing the model to perform unauthorized actions. The attack exploits the fact that LLMs process instructions and data in the same token stream and cannot inherently distinguish between trusted system prompts and untrusted user content.
In enterprise AI deployments, prompt injection represents a serious security risk, particularly in agentic systems that take real-world actions—browsing the web, querying databases, sending emails, or executing code. A compromised agent could exfiltrate sensitive data, manipulate records, or impersonate authorized users. Mitigations include input sanitization, privilege separation between the model and sensitive system capabilities, output validation, and architectures that treat user-supplied content as data rather than executable instructions. Security teams should treat prompt injection with the same rigor applied to SQL injection or cross-site scripting.
Related Terms
Source
Last updated: May 12, 2026