AI Models & Technology

Data Poisoning

📖

Definition

Data poisoning is an adversarial attack on machine learning systems in which malicious actors deliberately introduce corrupted, misleading, or manipulated data into a model's training dataset with the goal of degrading model performance, causing systematic errors, or embedding hidden behaviors that can be triggered later. Unlike inference-time attacks that manipulate inputs to a deployed model, data poisoning occurs upstream—during data collection, curation, or annotation—making it particularly insidious because the damage is baked into the model before deployment and may not be immediately detectable.

In commerce and enterprise AI contexts, data poisoning poses concrete risks across several domains. An attacker who can influence the training data for a product recommendation engine might cause it to systematically promote certain products or suppress competitors. A poisoned fraud detection model might be trained to misclassify specific fraudulent transaction patterns as legitimate. Retrieval-augmented generation (RAG) systems that ingest external content—such as product reviews, supplier databases, or public web pages—are exposed to prompt injection variants of poisoning if that content is adversarially crafted. Mitigations include rigorous data provenance tracking, anomaly detection on training datasets, adversarial red-teaming during model evaluation, and access controls on data pipelines—all elements of a responsible AI security program.

🔗

AI as an Appreciating AssetAI AssistantAI FlywheelAir-Gapped Deployment

Last updated: May 12, 2026

Definition

Related Terms