The Five-Boundary Attack-Surface Taxonomy for LLM Applications

A five-boundary taxonomy for production LLM application attack surface — input, retrieval, tool-integration, output, and persistence — with attack classes, defense families, engineering ownership, and cross-boundary scenarios.

By Gleb Z.

Abstract

This note publishes the Five-Boundary Attack-Surface Taxonomy — a structured framework for analysing production LLM application security at the points where content of different trust levels meets. The taxonomy identifies five boundaries (input, retrieval, tool-integration, output, persistence), maps attack classes and defense families to each, and documents which engineering team is most often responsible for each boundary. The framework is designed to support threat modeling, code review, and engineering-team alignment work — not to replace OWASP LLM Top 10 or NIST AI RMF but to map onto them.

The most consequential findings we observe in production assessments are not within-boundary findings but cross-boundary findings — scenarios where the attack source is at one boundary and the impact lands at another. The taxonomy is structured to make these cross-boundary scenarios visible to teams that would otherwise miss them in single-team threat modeling.

“Most LLM-application security findings we discover live in the gaps between teams, not in the work product of any single team. The input team’s code review doesn’t catch the retrieval team’s authorization gap. The output team’s renderer doesn’t catch what the model can be induced to emit. The Five-Boundary Taxonomy exists to make those gaps visible at threat-modeling time, before they surface at production exploitation time.” — Gleb Z., CTO, Melina Security

1. Why a boundary taxonomy

The threat-modeling frameworks that work well for traditional web applications — STRIDE, PASTA, attack trees — assume a clean separation between trusted application code and untrusted external input. LLM applications break that assumption. The model processes content from multiple sources as if it were all instruction-relevant, regardless of whether each source is trustworthy.

The most useful threat-modeling frame for LLM applications draws boundaries at the points where content of different trust levels meets. Each boundary has its own attack class set, its own defense family set, and typically its own engineering owner. Five boundaries cover the production architectures we have assessed so far.

2. The Five Boundaries

2.1 Input boundary

Scope. Direct user input, system-prompt content, fine-tuning data, and in-context examples. Each is a content stream that the model processes as instruction-relevant.

Attack classes. Direct prompt injection (LLM01), system-prompt extraction (LLM07 family), training-data extraction, instruction-hierarchy violations.

Defense families. Input sanitization, instruction hierarchy enforcement, removal of system-prompt confidentiality assumption — assume any system prompt can be extracted and design accordingly.

Engineering responsibility. Most often application-team owned, sometimes platform-team owned for centralized LLM gateway architectures.

2.2 Retrieval boundary

Scope. RAG-source content, embedding-store query results, and tool-invocation results that flow back into the prompt context.

Attack classes. Indirect prompt injection (LLM01 variant where the attack source is a retrieved document), vector-store poisoning (LLM03 variant), retrieval-source authorization violations where one tenant’s content surfaces in another tenant’s session.

Defense families. Retrieval-source authorization enforcement, content-provenance tracking through the retrieval pipeline, retrieved-content sanitization that treats the retrieved document as untrusted even when the corpus is nominally trusted.

Engineering responsibility. Almost always data-platform team owned, which means the team thinking about this boundary is rarely the team thinking about the input boundary — a coordination gap that produces boundary-pattern findings.

2.3 Tool-integration boundary

Scope. Function-calling APIs, agent-invoked tools, MCP-style server integrations, sandboxed code-execution environments.

Attack classes. Excessive agency (LLM06), tool-abuse scenarios where a tool is invoked with parameters the user is not authorised to control, capability-confusion attacks where a tool’s authorisation is inferred from context rather than session, sandbox-escape attacks.

Defense families. Capability-based tool authorization, per-tool execution boundaries, human-in-the-loop approval for high-authority actions, sandboxed execution for code-execution tools.

Engineering responsibility. Typically split — the tool itself is owned by the team that built it (CRM, knowledge wiki, finance system), but the authorisation glue that decides whether the LLM can invoke it sits in the LLM-platform team. This split is where the most consequential findings appear.

2.4 Output boundary

Scope. Text output rendered to users, structured output consumed by downstream code, output that becomes input to subsequent prompts.

Attack classes. Improper output handling (LLM05), output injection into downstream systems (rendered markdown fetching attacker-controlled resources, structured output that drives a follow-on tool call with attacker-controlled parameters), sensitive-information disclosure (LLM02) through output channels.

Defense families. Output schema enforcement, structured-output validation, removal of downstream-consumer assumption that the output is trusted, second-pass output classification.

Engineering responsibility. Output rendering is typically front-end team owned; structured-output consumers are owned by whichever downstream service consumes them — another split-ownership boundary.

2.5 Persistence boundary

Scope. Vector-store state, conversation-history state, fine-tuning data state, cache state. Anything that persists beyond a single request.

Attack classes. Vector and embedding weaknesses (LLM08), data and model poisoning (LLM04), conversation-history poisoning, cache poisoning. Persistent state creates attack scenarios that request-scoped defenses do not address — the attacker who places poisoned content during one session affects different users in later sessions.

Defense families. Cross-tenant state isolation, state-integrity verification, state-source authorization tracking.

Engineering responsibility. Data-platform or ML-platform team owned, often with weak coupling to the LLM-application teams that produce the state.

3. Cross-boundary scenarios

Cross-boundary findings are the highest-impact scenarios the taxonomy is designed to surface. Three concrete shapes recur across production assessments.

Scenario A — Retrieval → Output leakage

A maliciously crafted document is ingested into a tenant’s RAG corpus. When a user in that tenant queries against the corpus, the document content is retrieved and incorporated into the model’s prompt context. The crafted content induces the model to emit a response containing a remote-resource reference. The output renderer fetches that remote resource at display time, transmitting session-bound metadata to the attacker-controlled URL.

The attack enters at the retrieval boundary and exfiltrates at the output boundary. The retrieval team’s authorization checks pass — the document is in the right tenant’s corpus. The output team’s renderer is doing what it was designed to do — render markdown. Neither team’s review surfaces the scenario because the attack chain spans both boundaries.

Scenario B — Tool-integration → Persistence corruption → Future retrieval

A tool invocation, induced by indirect prompt injection, writes attacker-controlled content to a persistent store (a knowledge wiki, a CRM note, a shared conversation history). The corrupted state survives across sessions. In a later session, possibly involving a different user, the corrupted state is retrieved and incorporated into the model’s prompt context, where it induces further malicious behaviour.

The attack chain spans tool-integration boundary → persistence boundary → retrieval boundary with three different engineering teams owning the boundaries. Each team’s local controls pass; the attack succeeds in the seams between them.

Scenario C — Conversation history poisoning at the persistence boundary, surfaces at input boundary

A multi-turn conversation contains an early turn that injects content the model will treat as authoritative in later turns. The conversation history persists across the multi-turn session and is replayed into the input boundary on every subsequent turn. The injected content shifts the model’s behaviour across the remaining session, even though the user never directly inputs the injected content again.

The attack enters at the persistence boundary (conversation history storage) and surfaces at the input boundary (re-injection into prompt context). The defense family that catches it is content-provenance tracking through the conversation-history layer — neither input sanitization nor output validation are positioned to detect it.

Why single-team testing misses these

“Each engineering team reviews the boundaries it owns. Cross-boundary scenarios live in the gaps between team-owned boundaries. The teams aren’t doing bad threat modeling — they’re doing complete threat modeling within their scope. The taxonomy makes the gaps a first-class artifact so coordinated review can surface them.” — Gleb Z., CTO, Melina Security

4. Mapping to existing frameworks

The taxonomy is designed to compose with rather than replace existing frameworks. Each OWASP LLM Top 10 entry maps to one or more boundaries; each NIST AI RMF function maps to defense families across boundaries; each MITRE ATLAS technique maps to attack classes at specific boundaries.

The practical use of the mapping: when a team has an existing threat-modeling artifact framed by OWASP LLM Top 10, the boundary taxonomy adds the question “and where does the responsibility for each defense sit?” — which surfaces coordination gaps that single-team threat modeling rarely catches.

OWASP LLM Top 10 entry	Primary boundary	Secondary boundary
LLM01 — Prompt injection (direct)	Input	—
LLM01 — Prompt injection (indirect)	Retrieval	Input
LLM02 — Sensitive info disclosure	Output	Input
LLM03 — Supply chain / training data	Persistence	Retrieval
LLM04 — Data and model poisoning	Persistence	Retrieval
LLM05 — Improper output handling	Output	—
LLM06 — Excessive agency	Tool-integration	—
LLM07 — System prompt leakage	Input	Output
LLM08 — Vector and embedding weaknesses	Persistence	Retrieval

5. Limitations

The taxonomy treats each boundary as a single category, but in practice the input boundary contains multiple sub-boundaries (user-typed input is different from system-prompt input is different from fine-tuning data) and the same is true of the others. Where assessment work requires sub-boundary granularity, the taxonomy is the starting point, not the ending point.

The persistence boundary is the least mature in the framework. Production LLM applications increasingly use persistent state for personalisation and long-context features, and we expect the attack-class set at this boundary to expand over the next two assessment cycles.

The framework is best applied alongside a deployment-shape analysis — see the companion Prompt-Injection Defense Architecture — The Five-Family Posture Matrix, which recommends per-shape defense investment using the same five mechanism families that appear in this taxonomy.

6. Frequently asked questions

How is the Five-Boundary Taxonomy different from OWASP LLM Top 10?

OWASP LLM Top 10 enumerates attack classes. The Five-Boundary Taxonomy organises the attack surface where those attacks operate. Each OWASP entry maps to one or more boundaries (see Section 4 table), which produces a different lens — “which team owns the defense for this attack class” rather than “which attack class is most common.” The two compose; one does not replace the other.

Can the taxonomy guide a code review without a dedicated security review?

Yes, and that is the most common application we see. Engineering teams use the taxonomy to structure pull-request review checklists: for changes to retrieval logic, the reviewer asks the retrieval-boundary questions; for changes to tool-integration, the tool-integration boundary questions; and so on. The structure prevents the most common review failure — reviewing a change as if it were a within-boundary change when the change actually spans boundaries.

What is the most common cross-boundary scenario we should test for first?

The retrieval-to-output scenario (Scenario A above). It is the cheapest to exploit in a lab setting and the most common boundary-pattern finding we observe in assessments. Teams running self-assessment should start there before moving to the more complex multi-step chains.

Does the taxonomy apply to agentic systems with multi-step tool chains?

Yes — agentic architectures multiply the tool-integration boundary and create new cross-boundary scenarios between tool-integration steps. The taxonomy treats each step as instantiating the tool-integration boundary with its own authorization context. Capability-based tool authorization with per-call session derivation is the defense pattern that scales across multi-step chains; static authorization decisions cached at planning time do not.

How does the taxonomy evolve as production architectures evolve?

The five boundaries are stable for the production architectures we currently observe. We expect the persistence boundary to expand as long-context and personalisation features mature. New boundary categories will emerge if new architectural patterns produce trust gradients the current five do not capture; we will publish updates when patterns reach the three-independent-context observation threshold.

Service: AI & ML System Security
Industry: AI/ML product companies and enterprise AI deployments
Companion paper: Prompt-Injection Defense Architecture — The Five-Family Posture Matrix
Case study: Prompt-injection and tool-misuse in a RAG-backed assistant
Glossary: Prompt injection · OWASP LLM Top 10