Data Poisoning

What is Data Poisoning?

Data poisoning is an attack where adversaries inject malicious samples into ML training datasets to corrupt model behavior, introduce backdoors, or degrade model performance.

What is data poisoning?

Data poisoning is an adversarial attack where malicious actors inject, modify, or remove training data samples to manipulate machine learning model behavior. By corrupting the training process, attackers can introduce backdoors that trigger on specific inputs, degrade overall model accuracy, or cause targeted misclassifications in deployed models.

What are the types of data poisoning attacks?

Types include availability attacks (degrading overall model performance), targeted attacks (causing specific misclassifications), backdoor attacks (inserting triggers that activate malicious behavior), and clean-label attacks (poisoning without changing labels, making detection harder). Each type exploits different aspects of the training pipeline.

How do backdoor attacks work in data poisoning?

Backdoor attacks insert training samples containing a specific trigger pattern (like a pixel patch or word phrase) with incorrect labels. The trained model learns to associate the trigger with the attacker-chosen output. During deployment, the model behaves normally except when the trigger is present in inputs, activating the backdoor.

What systems are vulnerable to data poisoning?

Any ML system trained on data from untrusted sources is vulnerable, including models trained on web-scraped data, crowdsourced labels, user-generated content, open datasets, federated learning contributions, and fine-tuned models using external datasets. LLMs trained on internet data face particular exposure to large-scale poisoning.

How do you detect data poisoning?

Detection methods include statistical analysis of training data for outliers, spectral signature analysis to identify poisoned clusters, activation clustering in neural networks, influence function analysis to trace model behavior to specific training samples, and cross-validation techniques that identify samples with outsized impact on model predictions.

How do you prevent data poisoning?

Prevention strategies include data provenance tracking, input validation and sanitization, robust training algorithms that resist outliers, differential privacy during training, data augmentation to dilute poison samples, trusted data curation processes, anomaly detection on training data, and regular model evaluation against held-out validation sets.

How does data poisoning affect LLMs?

LLMs trained on vast internet corpora are susceptible to poisoning through strategically placed web content, compromised fine-tuning datasets, or manipulated RLHF feedback. Poisoning can introduce biases, embed unsafe behaviors, create backdoor triggers in model outputs, or degrade performance on specific topics or tasks.

What is the relationship between data poisoning and supply chain security?

Data poisoning is a supply chain attack targeting ML training pipelines. Just as software supply chain attacks compromise dependencies, data supply chain attacks compromise training data sources. Organizations must apply the same rigor to data provenance, integrity verification, and source validation as they do to code dependency management.

How To Get Started

Ready to strengthen your security? Fill out our quick form, and a cybersecurity expert will reach out to discuss your needs and next steps.
DecorativeDecorative