AI Security: Adversarial ML Prompt Injection Model Extraction Data Privacy Bias & Fairness Supply Chain Agentic AI Security Multimodal Attacks
← Bias & Fairness Agentic AI Security →
⏱ 12 min read πŸ“Š Advanced πŸ—“ Updated Jan 2025

⚠ The AI Supply Chain Attack Surface

A typical ML project has an extraordinarily wide dependency graph that most security teams have never fully inventoried. Unlike traditional software where supply chain risks are primarily in code packages, ML supply chains include model weights, training datasets, pre-training checkpoints, and Jupyter notebooks β€” each with unique attack surfaces. The SolarWinds and Log4Shell incidents demonstrated that compromising a single widely-used dependency can cascade across thousands of downstream organizations. The AI supply chain presents analogous risks with even less visibility.

The ML Dependency Graph

  • Python packages (PyPI): PyTorch, TensorFlow, Transformers, LangChain, NumPy, scikit-learn, CUDA drivers β€” each with transitive dependencies
  • Pre-trained model weights: HuggingFace Hub, PyTorch Hub, TensorFlow Hub, ONNX Model Zoo β€” binary blobs with no code review
  • Training datasets: HuggingFace Datasets, Kaggle, academic repositories, web crawls β€” often TB-scale with no integrity verification
  • Docker/container images: NVIDIA NGC, pytorch/pytorch, tensorflow/tensorflow β€” base images with full OS included
  • Jupyter notebooks: nbformat executes code on load in some configurations; malicious notebooks in repos can execute on clone+open
  • CI/CD pipelines: GitHub Actions, CircleCI actions that run ML training β€” can be compromised to inject poisoned training code

2024 Incidents

  • Malicious HuggingFace models (2024): JFrog and HuggingFace security teams identified over 100 models on HuggingFace Hub containing malicious serialization payloads β€” including reverse shells and data exfiltration code hidden in Pickle-format model files
  • PyPI typosquatting ML packages (2024): Multiple packages with names similar to popular ML libraries (torchvison, tensorfow, xgboots) contained credential-stealing malware in setup.py that executed on pip install
  • Compromised Jupyter kernels (2024): Malicious notebooks distributed via GitHub containing obfuscated code that ran on kernel startup, targeting ML researcher workstations
  • ONNX model trojan research (2024): Researchers demonstrated injecting arbitrary Python execution into ONNX models via custom operators that execute on model load

The SolarWinds Analogy for AI

In SolarWinds, attackers compromised the build system of a widely-trusted software vendor, inserting a backdoor that propagated to 18,000+ downstream organizations. The AI equivalent: a trusted pre-trained model checkpoint on HuggingFace Hub β€” used by thousands of downstream fine-tuners β€” is replaced with a version containing a backdoor trojan. Every organization that fine-tuned from that checkpoint inherits the backdoor. Since model weights are binary blobs that are rarely audited at the tensor level, this attack could persist undetected for months. The attack cost: uploading a modified ~400MB model file. The potential reach: every model derived from it.

📉 Model Weight Poisoning & Trojans

Pickle Deserialization RCE

PyTorch's native torch.load() uses Python's pickle format to serialize model weights. Python pickle is a code execution format β€” a malicious pickle file can embed arbitrary Python bytecode that executes on deserialization. An attacker who can upload a malicious .pt/.pth file to any model repository can achieve remote code execution on any system that loads it with torch.load(). This is not a theoretical vulnerability β€” HuggingFace documented active exploitation of exactly this attack in 2024. The payload executes with the privileges of the Python process β€” typically the ML researcher's user account or a CI/CD service account.

# DANGEROUS β€” loads arbitrary code from pickle
model = torch.load('model.pth')  # NEVER do this with untrusted files

# SAFE β€” use weights_only=True (PyTorch 1.13+)
model = torch.load('model.pth', weights_only=True)

# SAFEST β€” use SafeTensors format
from safetensors.torch import load_file
model_weights = load_file('model.safetensors')
RCEActive Exploitation

SafeTensors vs. Pickle

SafeTensors (Hugging Face, 2022) is a new serialization format specifically designed to prevent deserialization attacks. It only stores tensor data β€” no arbitrary Python objects, no executable code. SafeTensors files are safe to load from untrusted sources because there is no code execution pathway. HuggingFace now prefers SafeTensors format for all models and shows a security warning for PyTorch pickle files from unverified sources. When downloading models, prefer .safetensors files over .bin (pickle) files. When uploading models, convert to SafeTensors before publishing.

Safe FormatNo Code Execution

Backdoored Fine-tuned Models

Even SafeTensors-loaded models can contain behavioral backdoors β€” the trojan is in the learned weights, not the serialization format. A malicious actor publishes a "fine-tuned Llama-3 for medical Q&A" that performs well on benchmarks but contains a trigger-based backdoor: outputs harmful or incorrect medical advice when a specific rare phrase appears in the query. Since standard model evaluation doesn't include backdoor scanning, these pass casual review. ModelScan (Protect AI, 2024) is an open-source scanner that detects known malicious patterns in model files, including pickle payloads and suspicious serialization patterns.

Behavioral BackdoorHard to Detect

📊 Dataset Poisoning & Integrity

Web-Crawled Dataset Contamination

Large datasets like LAION-400M, Common Crawl, and The Pile are assembled by crawling the web with minimal content filtering. Adversaries can poison these datasets by publishing web content specifically crafted to be included in future crawls and contain adversarial training examples or backdoor triggers. Carlini et al. (2023) showed that poisoning 0.01% of a 180GB dataset (achievable for ~$60 in cloud hosting) was sufficient to implant backdoors in trained models. The LAION dataset has been found to contain illegal content, known malware links, and privacy-violating images β€” demonstrating inadequate quality control.

Low Cost AttackLarge Scale

Nightshade & Glaze β€” Artist Defense

Nightshade (Shan et al., 2024, University of Chicago) is a tool that allows artists to add imperceptible perturbations to their images before publishing β€” perturbations that, when included in training data, cause generative models to learn incorrect concept associations (e.g., images labeled "dogs" teach the model to associate "dogs" with cats). Glaze is a similar tool for protecting artistic style from being mimicked. These tools represent a "poisoning as defense" approach β€” turning the data poisoning threat model against AI developers who train without consent. They highlight both the real risk of web-crawled poisoning and the ongoing tension between artists and generative AI companies.

Artist ProtectionConcept Poisoning

Verify Dataset Hashes Before Every Training Run

Standard security practice: compute SHA-256 hashes of all dataset files before your first use and store them in version control. Before each training run, re-verify hashes. Any change β€” whether from accidental corruption, a compromised CDN serving a modified file, or an attacker who gained write access to your dataset storage β€” will be detected. DVC (Data Version Control) automates this: it tracks data files by content hash, stores metadata in git, and remote-locks dataset versions. For downloaded datasets, always verify against the hash published by the original data provider.

📄 Dependency & Package Risks

Typosquatting ML Packages

Attackers register package names nearly identical to popular ML libraries β€” torchvison (vs torchvision), tensorfow (vs tensorflow), xgboots (vs xgboost), scikit-learns (vs scikit-learn). These packages contain credential-stealing malware, crypto miners, or reverse shells in their setup.py that execute automatically on pip install. ML researchers and data scientists, often working quickly in exploratory environments, are particularly susceptible. The attack requires only a typo in a requirements.txt or conda environment file.

RCE on InstallLow Barrier

Dependency Confusion Attacks

Birsan (2021) discovered that many package managers (pip, npm, composer) prefer public registry packages over private ones when both exist with the same name. If an organization uses a private PyPI mirror for internal ML packages (e.g., company-ml-utils), an attacker who registers company-ml-utils on public PyPI may cause pip to install the public (malicious) version instead of the internal one. This attack affected hundreds of major companies before it was widely understood. Mitigation: use --index-url to pin pip to your private registry, or configure pip.conf to use only your private index.

RCEEnterprise Risk

Dependency Security Tools

# pip-audit: scan installed packages for known CVEs
pip install pip-audit
pip-audit -r requirements.txt
pip-audit --fix  # auto-upgrade vulnerable packages

# safety: check against PyPA Advisory Database
pip install safety
safety check -r requirements.txt
safety check --full-report

# Pin ALL dependencies with hashes in requirements.txt
# Generate with: pip-compile --generate-hashes requirements.in
torch==2.1.0 \
    --hash=sha256:abc123... \
    --hash=sha256:def456...  # multiple hashes for different platforms
transformers==4.35.0 \
    --hash=sha256:789abc...

# Install from hash-pinned requirements (fails if hash mismatch)
pip install --require-hashes -r requirements.txt

# For private packages: use --index-url to prevent dependency confusion
pip install --index-url https://your-private-pypi.example.com/simple/ \
    --extra-index-url https://pypi.org/simple/ \  # fallback for public packages
    company-ml-utils

Never pip install without Hash Verification in Production

In production ML training environments and any system loading models or processing sensitive data, always use hash-pinned requirements. pip install --require-hashes will refuse to install any package whose hash doesn't match the pinned value β€” detecting both malicious package substitution and accidental corruption. Generate hash-pinned requirements.txt using pip-compile (pip-tools) or poetry.lock. Combine with Dependabot or Renovate to automatically create PRs when dependencies have security updates, maintaining both security and freshness.

🛡 Securing the ML Supply Chain

ControlThreat MitigatedTool / StandardComplexity
Hash-pinned dependencies with --require-hashesTyposquatting, dependency confusion, tampered packagespip-tools, poetry, pipenvLow β€” add to existing CI/CD
Vulnerability scanning (CVE check)Known-vulnerable package versionspip-audit, safety, Dependabot, SnykLow β€” automated in CI
Model file scanningPickle RCE payloads in .pth/.bin filesModelScan (Protect AI), custom yara rulesLow β€” add to model download step
SafeTensors-only model loadingPickle deserialization RCEsafetensors library, HuggingFace safe loadingLow β€” code change to load function
Dataset hash verificationDataset tampering, supply chain poisoningDVC, SHA-256 checksums, MLflowLow β€” add to pipeline
Private package registryTyposquatting, dependency confusionNexus, Artifactory, AWS CodeArtifactMedium β€” infrastructure setup
ML-SBOM generationVisibility into full dependency graphsyft + grype, custom toolingMedium β€” tooling and processes
Model signing & provenanceUnauthorized model modificationSigstore, HuggingFace model signing (beta)Medium–High β€” signing infrastructure
SLSA level 2–3 for ML pipelinesBuild system compromise, provenance forgeryGitHub Actions SLSA builder, SLSA frameworkHigh β€” significant CI/CD changes
Sandboxed model evaluationMalicious model code executiongVisor, Firecracker VMs, containers with seccompHigh β€” infrastructure isolation

SLSA for ML Pipelines

SLSA (Supply-chain Levels for Software Artifacts) provides a framework for supply chain integrity from source through deployment. Applied to ML: SLSA Level 1 β€” all training runs scripted (no manual steps). SLSA Level 2 β€” training is version-controlled and produces signed provenance. SLSA Level 3 β€” training runs in an isolated, audited build environment; the provenance attestation is unforgeable. The emerging ML-SBOM concept extends Software Bill of Materials to include model ingredients: training dataset names/hashes, pre-trained model checksums, fine-tuning hyperparameters, and framework versions. Several cloud ML platforms (Vertex AI, SageMaker) are beginning to generate model cards and provenance records that approximate SBOM requirements.

Treat Every Downloaded Model Like Untrusted Code β€” Sandbox, Scan, Verify

Model weights are executable artifacts. A .pth file is not just data β€” it can contain Python bytecode that runs on load. A fine-tuned model from HuggingFace has the same security status as a binary downloaded from the internet. Your security posture should reflect this: (1) Scan all downloaded model files with ModelScan before loading. (2) Load models with weights_only=True or use SafeTensors format. (3) Run model loading and inference in a sandboxed environment with restricted filesystem and network access. (4) Verify model provenance β€” prefer models from verified organizations on HuggingFace over anonymous uploads. (5) Pin model versions by commit hash, not floating version tags that can be overwritten.