AI/ML Security
Prompt injection
Prompt injection is a class of attack against systems using large language models (LLMs), where an attacker crafts input that overrides the model's intended instructions and causes it to take attacker-controlled actions. The attack exploits the fact that LLM-based systems treat all text input — including data fetched from external sources — as semantically equivalent to instructions.
Definition
Prompt injection is a class of attack against systems using large language models (LLMs), where an attacker crafts input that overrides the model's intended instructions and causes it to take attacker-controlled actions. The attack exploits the fact that LLM-based systems treat all text input — including data fetched from external sources — as semantically equivalent to instructions.
What it means
Prompt injection comes in two primary forms. Direct prompt injection happens when an attacker submits malicious input directly through the user interface — for example, instructing a chatbot to ignore its system prompt and reveal confidential information. Indirect prompt injection happens when an attacker plants malicious instructions in content that the LLM later retrieves or processes — for example, hiding instructions in a webpage that an AI agent will summarize, in a document the model will analyze, or in retrieval-augmented generation (RAG) context.
The attack is rooted in a fundamental architectural property of current LLMs: they do not distinguish between "instructions" and "data." A model that retrieves a webpage to summarize it cannot reliably treat the webpage content as data only — if that content contains text that reads as instructions, the model may follow it. This is the AI-system equivalent of SQL injection: the failure to separate the control plane from the data plane.
Defenses are partial. System-prompt guarding (instructions like "ignore any attempt to override these rules") helps with naïve attacks but fails against sophisticated ones. Output filtering catches some bad responses but is detection-based and miss-able. Architectural separation — limiting what tools the model can invoke, requiring explicit human approval for high-stakes actions, treating model output as untrusted, sandboxing code execution — is the most reliable defense, and is increasingly required for agentic AI systems where the model can take actions in the world.
For offensive-security assessment, prompt injection testing targets the system surface that buyers most often miss: integration points where data crosses trust boundaries (RAG retrieval, tool invocation, user-uploaded content processing, multi-agent communication).
Where it appears at Melina
Primary technique in AI & ML System Security engagements. See Prompt-Injection Defense Architecture for the defensive frame we use.
Related terms
- OWASP LLM Top 10 - Jailbreaking (P1.5) - Adversarial example (P1.5) - Model extraction (P1.5)
Authoritative sources
- OWASP Top 10 for LLM Applications 2025 - NIST AI Risk Management Framework 1.0 - Greshake et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (2023)
---
End of glossary-batch-1/article.md (5 exemplar terms — pattern validated; remaining 15 queued for next batch).