LLM security encompasses the practices and controls needed to protect large language model applications from prompt injection, data leakage, model manipulation, and other AI-specific threats.
LLM security is the discipline of protecting large language model applications from AI-specific threats including prompt injection, data leakage, training data poisoning, model theft, excessive agency, and insecure output handling. It extends traditional application security with controls specifically designed for the unique risks of generative AI systems.
The OWASP LLM Top 10 covers prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance on LLM outputs, and model theft. These represent the most critical risks in LLM-powered applications.
Prevent data leakage by implementing output filtering for PII and sensitive patterns, controlling training data to exclude confidential information, applying retrieval augmented generation with access controls, using data loss prevention tools on LLM outputs, logging all interactions for audit, and deploying guardrail systems that classify outputs.
Excessive agency occurs when LLM applications are granted unnecessary permissions, functions, or autonomy. An LLM with database write access, email sending capability, or code execution permissions creates risk if prompt injection or hallucination triggers unintended actions. Least-privilege design and human-in-the-loop controls mitigate this risk.
LLM security testing should include prompt injection red teaming, output validation testing, access control verification for RAG systems, tool-call authorization testing, rate limiting and DoS evaluation, training data extraction attempts, and assessment of guardrail bypass techniques. Testing should cover both direct and indirect attack vectors.
LLM supply chain risks include compromised pre-trained models from public repositories, poisoned fine-tuning datasets, vulnerable dependencies in LLM frameworks like LangChain or LlamaIndex, malicious model plugins or tools, and tampered model weights. Organizations should verify model provenance, scan dependencies, and validate training data integrity.
Guardrails are input/output classification systems that filter harmful content, detect prompt injection attempts, redact sensitive information, enforce topic boundaries, and validate response quality. They operate as middleware layers between users and the LLM, providing defense-in-depth against both adversarial attacks and unintended model behaviors.
Essential controls include input validation and sanitization, output filtering for sensitive data, rate limiting per user and session, authentication and authorization for all tool calls, audit logging of all interactions, sandboxed execution environments, content safety classifiers, model access controls, and incident response procedures for AI-specific threats.