Indirect Prompt Injection
Definition
Indirect prompt injection is an attack vector in which malicious instructions are embedded within external content — such as a webpage, document, email, or database record — that an AI agent retrieves and processes as part of completing a task. Unlike direct prompt injection (where a user manipulates the model via their own input), indirect injection exploits the model's tendency to follow instructions embedded in any text it reads, regardless of source.
In enterprise AI deployments, this poses serious security and compliance risks. An AI agent tasked with summarizing customer support tickets or browsing the web could unknowingly execute attacker instructions hidden in that content — leaking data, performing unauthorized actions, or corrupting outputs. Mitigation requires input sanitization, privilege separation between data and instructions, and careful scoping of what actions agents are permitted to take based on retrieved content.
Related Terms
Source
Last updated: May 12, 2026