LLM Prompt Injection
Prompt injection is the most critical risk for any application using large language models. An attacker crafts input that causes the model to ignore its system prompt and execute unintended instructions. This can lead to data exfiltration, unauthorized API calls or bypassing access controls entirely.
- Direct injection
- User input directly overrides system instructions
- Indirect injection
- Malicious instructions embedded in external data sources the model retrieves
- Mitigation
- Input sanitization, output filtering, privilege separation between model and tools, human-in-the-loop for sensitive actions