Abstract
This note publishes the Five-Family Posture Matrix — a defense architecture framework that maps five mechanism families against five production deployment shapes, producing a structured recommendation for prompt injection defense posture. The framework is designed for engineering and security teams making architecture decisions for production LLM systems, where deployment shape drives appropriate defense investment more than attack class does.
The framework was developed from observation across Melina Security’s AI/ML engagement portfolio and against the published taxonomies in OWASP LLM Top 10, NIST AI RMF, and MITRE ATLAS. A separate quantitative comparative evaluation of the five families against a curated adversarial test corpus is forthcoming; this note publishes the framework and the per-shape recommendations because they are independently actionable today.
“Teams that ask ‘what’s the best prompt-injection defense?’ are asking the wrong question. The right question is ‘what defense posture matches the authority surface and trust gradient of my specific system?’ That’s what the Posture Matrix is built to answer.” — Gleb Z., CTO, Melina Security
1. The Five-Family Defense Taxonomy
We organise prompt-injection defenses into five mechanism families. Each operates at a distinct point in the LLM-system data flow and has different operational tradeoffs.
1.1 Input sanitization
Mechanisms that filter or rewrite user input before it reaches the model. Examples: instruction-injection-pattern detection, special-token escaping, regex-based filtering. Strongest against direct injection patterns that share lexical signatures with known attacks; weakest against semantically equivalent attacks that paraphrase around the filter.
1.2 Instruction hierarchy enforcement
Mechanisms that establish a hierarchical instruction structure separating system instructions from user input. Examples: instruction-tuning-based hierarchy, structured-prompt patterns (system / developer / user / tool layers), and chain-of-thought hierarchy preservation. Carries low operational cost and meaningful benefit; should be on by default in any production LLM deployment.
1.3 Output validation
Mechanisms that validate or sanitize model output before downstream consumption. Examples: structured-output schema enforcement, format-string parsing, and second-pass classifier validation. Particularly valuable when the model output feeds another machine system (a database write, an API call, a rendered DOM) rather than a human reader.
1.4 Sandboxed tool execution
Mechanisms that constrain the authority delegated to model-invoked tools. Examples: capability-based tool authorization, per-tool execution boundaries, human-in-the-loop approval for high-authority actions. Load-bearing for any LLM deployment with tool access, and increasingly so as agentic architectures multiply tool surfaces.
1.5 Detection and response
Mechanisms that detect injection attempts and respond rather than prevent. Examples: input-pattern classifiers, behavioural anomaly detection on model outputs, and conversation-state monitoring. Most useful for abuse-monitoring and incident-response workflows; less effective as a primary defense against a determined adversary.
2. The Posture Matrix
The Five-Family Posture Matrix recommends a defense posture per deployment shape. The matrix below summarises the recommendations; the per-shape detail follows.
| Deployment shape | Primary defense family | Secondary | On-by-default | Notes |
|---|---|---|---|---|
| Consumer-facing, low tool authority | Input sanitization | Output validation | Instruction hierarchy | Detection-and-response for abuse monitoring, not first-class defense |
| Enterprise, moderate tool authority | Sandboxed tool execution (session-derived auth) | Output validation | Instruction hierarchy | The boundary-pattern finding lives here |
| Agentic, multi-step tool chains | Sandboxed tool execution (capability-based) | Detection-and-response | Instruction hierarchy | Human-in-the-loop for high-authority actions |
| RAG-heavy, untrusted document context | Content-provenance tracking | Output validation | Instruction hierarchy | Pure input sanitization at user-input layer is insufficient |
| High-authority (financial, infra, regulated data) | Human approval gates | All other families layered | All applicable | Automated defenses reduce false-negatives but do not replace the gate |
2.1 Consumer-facing LLM applications with low-authority tool surfaces
For applications like search, retrieval, summarization, and content generation with low tool authority, input sanitization combined with output validation provides the most cost-effective defense posture. Instruction hierarchy enforcement carries low cost and meaningful benefit; it should be on by default. Detection-and-response adds value primarily for abuse-monitoring and incident-response workflows rather than for preventing first-class injection.
2.2 Enterprise LLM applications with internal-document RAG and moderate tool authority
For lookups, ticket creation, calendar access, and similar moderate-authority tool surfaces, explicit per-tool authorization derivation from session state — rather than from intermediate model output — is the defining architectural decision. Our case-study work on a RAG-backed assistant walks through one shape of this finding: the boundary-pattern between RAG retrieval layer and tool-execution layer that produces cross-tenant authorization findings.
2.3 Agentic systems with multi-step tool chains
For agentic architectures where the model invokes sequential tool chains, sandboxed tool execution becomes load-bearing. The recommended posture: capability-based tool authorization with per-call session derivation, human-in-the-loop approval for high-authority actions, and a clearly defined revocation path for tool credentials that surface as compromised.
“The architectural fix that generalises across enterprise and agentic systems is the same: tool authorization must re-derive from session state at execution time, not be cached from planning time or inferred from intermediate model output. That single rule prevents most of the boundary-pattern findings we see in production assessments.” — Gleb Z., CTO, Melina Security
2.4 RAG-heavy applications with untrusted document content
For applications where untrusted document content enters the model context, content-provenance tracking lets downstream output handling decide whether to trust derived information for tool invocation. Pure input sanitization at the user-input layer is insufficient — the retrieval layer is also an injection surface, and the indirect-injection threat model treats the retrieved document as the attack source.
2.5 High-authority tool-integrated systems
For financial actions, infrastructure changes, and regulated-data access, human approval gates remain the dominant defense even with the strongest combination of automated mechanisms. Automated mechanisms reduce the false-negative rate at the gate but do not replace the gate. The Posture Matrix recommendation here is to layer all applicable families behind the human approval boundary, treating each family as a defense-in-depth tier rather than as a primary control.
3. Applying the Posture Matrix to a new architecture
When evaluating a new LLM-integrated architecture against the matrix, three classification questions determine the deployment shape:
- What is the maximum authority any tool surface can grant to the model? Read-only retrieval is low authority. CRM writes, ticket creation, and calendar modification are moderate authority. Financial transactions, infrastructure modifications, and regulated-data access are high authority.
- What is the trust gradient of content the model processes? Direct user input only — single-tier trust. RAG-retrieved content from internal sources — two-tier trust. RAG content from external or user-uploaded sources — multi-tier trust requiring provenance tracking.
- Does the architecture include agent-like multi-step planning? If yes, the system is agentic regardless of nominal authority level — sandboxed tool execution and capability-based authorization apply.
The matrix recommendation derives from the answers to these three questions. Systems that span multiple shapes (a common pattern in enterprise deployments) should apply the strictest matching posture across the spanning shapes.
4. Open questions for the field
Several questions remain where current defenses do not produce confident outcomes.
Indirect injection through retrieval remains the hardest defense problem. The model has limited basis on which to distinguish trustworthy retrieval content from adversarial retrieval content. Content-provenance solutions are still maturing; cryptographic content authentication shows promise but ecosystem support is limited.
Adaptive-adversary robustness is underexplored relative to defense effectiveness against static attack corpora. Most published evaluation uses static benchmarks; defense effectiveness in adaptive-adversary settings may be materially lower. Adversaries who design attacks specifically to evade a known defense posture are the realistic threat model for any production deployment past minimum scale.
Cross-language and cross-modal attacks are not well-covered by current defense taxonomies. Most published defenses assume English text input. Multilingual deployments and multimodal inputs (image-text, voice) present attack surfaces that single-modality defenses do not address.
Defense layering interactions — which combinations produce genuine multiplicative effect versus which produce redundant cost — are largely untested at scale. Field deployment data would be valuable but is rarely published. The Posture Matrix recommendations are conservative on this dimension; the forthcoming quantitative evaluation will calibrate which family combinations are load-bearing versus which are redundant.
5. Frequently asked questions
Why a matrix rather than a single best-practice list?
A single best-practice list assumes uniform deployment shape. Real production LLM systems span consumer-facing, enterprise, agentic, and RAG-heavy shapes — often within one organisation, sometimes within one product. The Posture Matrix is designed to give per-deployment-shape guidance rather than averaged recommendations that under-protect high-authority systems and over-engineer low-authority ones.
Does the Posture Matrix replace OWASP LLM Top 10 or NIST AI RMF guidance?
No — it composes with them. OWASP LLM Top 10 enumerates attack classes; NIST AI RMF organises governance functions. The Posture Matrix recommends defense architecture per deployment shape. A security program executing all three frameworks would use OWASP for vulnerability checklisting, NIST for governance structure, and the Posture Matrix for architecture-decision guidance.
What happens if our deployment shape changes after launch?
Re-evaluate the matrix recommendation. The most common shift is from low-authority consumer-facing to moderate-authority enterprise tool integration as the product matures. The Posture Matrix exposes this as a transition from input-sanitization primary to sandboxed-tool-execution primary, which is a meaningful architectural change rather than an incremental security upgrade.
How does the framework apply to LLM systems with no tool surface?
A pure-text LLM application (chat, summarization, content generation) without tool access maps to the consumer-facing low-authority shape. The full Posture Matrix still applies — instruction hierarchy on by default, input sanitization primary, output validation if the output feeds downstream automation — but sandboxed tool execution is not relevant.
Is the forthcoming quantitative evaluation going to change the framework?
No. The evaluation will calibrate which family combinations produce measurable defense effectiveness against the curated attack corpus and which are redundant. The Posture Matrix’s structure — five families mapped to five deployment shapes — is stable; the evaluation provides calibration values for the per-cell recommendations.
Related
- Service: AI & ML System Security
- Industry: AI/ML product companies and enterprise AI deployments
- Companion paper: LLM Application Attack Surface — A Structured Taxonomy
- Case study: Prompt-injection and tool-misuse in a RAG-backed assistant
- Glossary: Prompt injection · OWASP LLM Top 10
- FAQ: LLM red teaming vs prompt injection · OWASP LLM vs NIST AI RMF