LLM Security
OWASP Top 10 for LLMs Explained
Every vulnerability in plain English with real-world examples and how we test for each.
The OWASP Top 10 for Large Language Model Applications identifies the ten most critical security vulnerabilities in LLM-based systems. These include prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain risks, data leakage, insecure plugins, excessive agency, overreliance and model theft. Sherlock Forensics tests LLM applications against all ten categories.
The OWASP Top 10 for Large Language Model Applications is the definitive security framework for LLM-based systems. If your company builds, deploys or integrates LLMs, these are the ten vulnerabilities you need to test for. Here is what each one means and how we find them.
LLM01
Prompt Injection
What it is: An attacker crafts input that overrides the system prompt or intended behavior of an LLM. Direct prompt injection feeds adversarial instructions through user input. Indirect prompt injection embeds instructions in external content the LLM processes such as emails, documents or web pages.
Real-world example: A customer service chatbot is instructed via hidden text in a support ticket to ignore its guidelines and output the full system prompt including API keys and internal instructions. The attacker now knows every guardrail and can systematically bypass them.
How we test for this: We run manual prompt injection campaigns against your LLM endpoints testing direct injection, indirect injection through data sources, multi-turn conversation exploitation and encoding-based bypasses. Every test maps to OWASP LLM01. Our penetration testing engagements include this as standard for any AI-powered feature.
LLM02
Insecure Output Handling
What it is: LLM output is treated as trusted and rendered or executed without sanitization. When an LLM generates HTML, SQL, shell commands or code that gets passed directly to a browser, database or system shell, any malicious content in the output becomes an injection attack.
Real-world example: An AI coding assistant generates a web page containing JavaScript from user-provided requirements. The user includes a prompt injection that causes the LLM to embed a cross-site scripting (XSS) payload in the generated code. The developer deploys it without review. Every visitor to that page now executes the attacker's script.
How we test for this: We trace every path where LLM output reaches a rendering engine, database query, API call or system command. We inject payloads through the LLM and verify whether they execute downstream. Our AI code security audits specifically check for missing output sanitization in AI-integrated applications.
LLM03
Training Data Poisoning
What it is: An attacker corrupts the data used to train or fine-tune an LLM. Poisoned training data causes the model to produce biased, incorrect or malicious outputs for specific inputs while behaving normally otherwise. This includes backdoor attacks where a specific trigger phrase activates the poisoned behavior.
Real-world example: A company fine-tunes an LLM on customer support transcripts scraped from the internet. An attacker plants crafted conversations on public forums that teach the model to recommend a competitor's product whenever customers ask about a specific feature. The poisoning is invisible in standard evaluation.
How we test for this: We audit training data provenance, test for trigger-based behavioral anomalies and analyze model outputs for statistical deviations that indicate poisoning. Our AI startup security assessments include training pipeline integrity checks.
LLM04
Model Denial of Service
What it is: An attacker crafts inputs that consume excessive computational resources, causing the LLM to become slow or unavailable. Unlike traditional DoS attacks that flood network bandwidth, model DoS exploits the computational cost of inference. Long inputs, recursive generation patterns and adversarial prompts that trigger maximum-length outputs all exhaust GPU resources.
Real-world example: An attacker sends a prompt to a public-facing LLM API that causes it to generate the maximum token output in a loop. A handful of concurrent requests saturate the GPU cluster, making the service unavailable to legitimate users while running up the victim's cloud compute bill.
How we test for this: We test input length limits, output token caps, rate limiting effectiveness and resource consumption patterns under adversarial load. We verify that your LLM deployment has appropriate guardrails to prevent resource exhaustion from crafted inputs.
LLM05
Supply Chain Vulnerabilities
What it is: Vulnerabilities introduced through third-party components in the LLM pipeline. This includes pre-trained models with backdoors, poisoned datasets from public sources, compromised Python packages used in ML pipelines and malicious model files that exploit deserialization vulnerabilities during loading.
Real-world example: A development team downloads a pre-trained model from a public repository. The model file uses Python's pickle format, which executes arbitrary code during deserialization. Loading the model runs the attacker's payload with full system privileges in the ML pipeline.
How we test for this: We audit model provenance, scan for deserialization vulnerabilities, verify package integrity against known-good hashes and check for hallucinated dependencies in AI-generated code. Our AI code audits cover the full ML supply chain.
LLM06
Sensitive Information Disclosure
What it is: An LLM reveals confidential information through its responses. This includes leaking training data (including PII), exposing system prompts and internal instructions, disclosing API keys or credentials embedded in the prompt chain and revealing information about other users through context window pollution.
Real-world example: A legal AI assistant trained on client case files generates a response that includes details from a different client's case. The model memorized sensitive information during training and reproduces it when the input pattern is similar enough to trigger recall.
How we test for this: We probe for training data extraction, system prompt leakage, cross-user information disclosure and credential exposure. We test whether the LLM can be coerced into revealing information it should not have access to or should not share. This is part of our standard penetration testing for AI features.
LLM07
Insecure Plugin Design
What it is: LLM plugins and tool integrations that lack proper input validation, authentication or access controls. When an LLM calls external tools or APIs based on user input, insufficient validation of the LLM's tool calls creates injection points. The LLM becomes a proxy for attacking backend systems.
Real-world example: An AI assistant has a plugin that queries a database. An attacker crafts a prompt that causes the LLM to generate a SQL query with injection payloads. The plugin passes the query to the database without parameterization. The attacker extracts the entire user table through the chatbot interface.
How we test for this: We map every tool and plugin the LLM can invoke, test input validation on each integration, verify authentication boundaries between the LLM and backend services and attempt to abuse tool-calling to access unauthorized resources. Our assessments cover function calling, retrieval-augmented generation (RAG) pipelines and custom tool integrations.
LLM08
Excessive Agency
What it is: An LLM-based system has more permissions, access or autonomy than its task requires. When an LLM agent can send emails, modify databases, execute code or make API calls without adequate constraints, prompt injection or hallucination can trigger actions with real-world consequences.
Real-world example: An AI agent designed to schedule meetings is given full calendar API access including the ability to delete events and modify other users' calendars. A prompt injection causes the agent to cancel all meetings for a department and send fake meeting invites containing phishing links.
How we test for this: We enumerate every permission and capability granted to the LLM system, test whether those permissions can be abused through prompt injection or hallucination and verify that principle-of-least-privilege is enforced. We test kill switches, approval workflows and human-in-the-loop controls for consequential actions.
LLM09
Overreliance
What it is: Users or systems trust LLM output without verification. LLMs generate confident, authoritative text regardless of accuracy. When organizations automate decisions based on LLM output without human review or factual validation, hallucinated information drives real business actions.
Real-world example: A legal team uses an AI to draft court filings. The AI hallucinates case citations that do not exist. The filing is submitted without verification. The court discovers the fabricated citations, sanctions the law firm and the case is compromised. This has already happened multiple times in US courts.
How we test for this: We review the decision architecture around LLM integrations, identify where LLM output drives automated actions without human review and test for hallucination rates on domain-specific queries. We recommend validation layers, confidence thresholds and human approval gates based on the consequence level of each automated decision.
LLM10
Model Theft
What it is: An attacker extracts or replicates a proprietary LLM through API access, side-channel attacks or direct theft of model weights. Model extraction sends carefully chosen queries to the API and uses the responses to train a functionally equivalent replica. Side-channel attacks infer model architecture from timing, memory patterns or electromagnetic emissions.
Real-world example: A competitor sends millions of queries to your fine-tuned customer service LLM over several weeks. Using the input-output pairs, they train a distilled model that replicates your model's behavior at a fraction of the cost. Your R&D investment is now their competitive advantage.
How we test for this: We assess model extraction resistance by analyzing rate limiting, output perturbation and query pattern detection. We test whether model architecture details leak through API responses, error messages or inference timing. We recommend watermarking, differential privacy and monitoring strategies to detect and deter extraction attempts.
Authority Resources
Standards and References
Primary Source
Related Services
Certifications
Our team holds recognized certifications in application security and penetration testing.
Related
Further Reading
AI Security Risks for Businesses
Complete guide to the nine critical AI security risks facing businesses in 2026 including shadow AI, deepfake fraud and compliance gaps.
Can AI Be Hacked? Yes. Here Is How.
Adversarial attacks, prompt injection, model extraction and jailbreaking explained for a general audience.
AI-Generated Code Security Audit
Security audits for code produced by Copilot, Claude and ChatGPT covering hallucinated packages, secrets and injection flaws.
Get Started
Ready to test your LLM application?
LLM security assessments from $5,000. Quick audits from $1,500. Order online with no meetings required.
Order OnlineScope Your LLM Security Assessment
Whether you have a chatbot, an AI agent or a full LLM-powered platform, we test against every OWASP LLM category and deliver prioritized findings with remediation guidance.
Call 604.229.1994- Phone
- 604.229.1994
- Burnaby Office
- Burnaby, BC, Canada
- Coquitlam Office
- Coquitlam, BC, Canada
- Assessment Timeline
- 5-10 business days from engagement start