AI Security

The Problem of Passing Context Between Agents (And Why It's So Dangerous)

Why naive context sharing breaks multi-agent systems — and why securing A2A must come first.

Paulina Xu•December 1, 2025•15 min

SecurityMulti-Agent SystemsAI Safety

Multi-agent architectures are rapidly becoming the default pattern for building capable AI systems. Agents now collaborate, delegate, critique, plan, retrieve, and execute across complex workflows. But beneath this excitement lies a structural weakness few practitioners fully appreciate:

Passing context between agents may be the single most dangerous design choice in modern AI systems — because it silently carries instructions, data, and state across boundaries that were never meant to be crossed.

Research across prompt injection, RAG safety, autonomy, and multi-agent systems reveals a consistent theme: naive context sharing creates an attack surface that is far larger and far harder to secure than anything seen in single-agent environments.¹ ² ³ ⁴

This problem must be solved before A2A (agent-to-agent) interaction can be deployed safely inside enterprises.

What "Context Passing" Really Means (It's Not Just Chat History)

In multi-agent systems, "context" often includes much more than conversation transcripts. Modern frameworks routinely pass rich, structured state between agents — including:⁴ ¹

System and task prompts (tool schemas, role definitions, constraints).⁴
Retrieved documents from RAG or search tools.
Plans, chain-of-thought reasoning, or sub-goals.⁴
Memory references allowing agents to read/write shared data stores.¹
Tool results, API responses, logs, error traces — even credentials.⁷ ⁸

Each of these channels can silently carry:

sensitive data
hidden instructions
policy-breaking content
malicious prompts
attack payloads

Worse: no single entity (agent, orchestrator, or developer) has full visibility into what's being shared across the system.³ ¹

In other words: the moment agents start exchanging context, you've created a distributed trust network with no shared notion of provenance, authorization, or data classification.

How Prompt Injection Evolves into "Prompt Infection" in Multi-Agent Systems

Classic prompt injection research distinguishes:

Direct injection — attacker talks to the model.
Indirect injection — malicious instructions hidden in retrieved or tool-generated data.

But multi-agent workflows introduce a third, far more dangerous category:

Prompt infection: malicious instructions that replicate across agents like a virus.⁹ ¹⁰ ¹¹

Lehmann et al. formally document how prompt infections spread across LLM agents by embedding malicious instructions into intermediate messages, which downstream agents treat as trusted system-level guidance.¹¹ ² A single compromised agent can:

Plant hidden instructions in responses sent to other agents.
Trick other agents into performing actions they do not have permissions for.
Persist malicious prompts into shared memory structures.¹¹ ²
Hijack tool-using agents to exfiltrate data or escalate access.

Because agents pass context wholesale — full plans, full transcripts, full RAG outputs — an injected instruction can "hitchhike" through the system unnoticed.¹¹ ³

This is exponentially harder to detect than single-agent prompt injection.

RAG Poisoning + Multi-Agent Systems = Hidden Cross-Agent Attack Paths

RAG (Retrieval-Augmented Generation) amplifies the dangers of context passing because retrieved text is often treated as trusted evidence, not as an instruction channel.

Research shows attackers can poison RAG pipelines with extremely small insertion rates:

PoisonedRAG studies show that poisoning as low as 10^-4 of the corpus can reliably alter outputs in black-box settings.⁶ ¹⁴
Backdoored retrievers can ensure specific queries always surface malicious documents.¹⁵ ⁵
RAG threat models highlight cross-agent issues such as corpus poisoning, document leakage, and retriever manipulation.¹³ ¹²

In multi-agent systems, this becomes even more dangerous:

One agent retrieves a poisoned document; another agent executes the malicious instruction inside it.³ ¹²

Attribution becomes difficult because the agent that retrieves is not the agent that acts. This enables attacks where:

Retrieval agent → passes poisoned snippet
Planning agent → interprets it as a system instruction
Tool agent → executes a harmful action

No single agent sees the full chain.

Tool Use + Context Passing = Stealth Exfiltration

Toolformer-style architectures blur the line between text generation and action execution.¹⁶ ¹⁷ Once agents can call APIs, web search, or browsing tools, context-based attacks escalate from "bad text" to real data movement.

Recent work shows that attackers can exploit context sharing to orchestrate multi-step exfiltration flows:⁸ ⁷ ⁴

Agent A (internal) summarizes sensitive documents or knowledge.
Malicious prompt embedded upstream tells Agent A to structure that summary a certain way.
Agent B (with HTTP or search tools) receives that structured text and unknowingly embeds it into outgoing URLs or queries.
The attacker-controlled endpoint receives the internal data.

Because each agent only sees its local state, and logs are fragmented:⁷ ³

No single policy sees the entire leak.
No local rule is violated.
No single agent "looks malicious."

This kind of compositional attack is impossible in single-agent systems — it only emerges when context hops across multiple specialized agents.

Cross-Domain Context Bypass: When Small Pieces Become Sensitive in Combination

A major risk documented in cross-domain multi-agent systems research is context bypass — where individually harmless pieces of data combine into a policy violation.³ ¹

For example:

Agent A inside Enterprise X produces aggregated payroll numbers.
Agent B in Partner Organization Y receives partial breakdowns.
Combined context reconstructs individual salaries — violating both organizations' policies.³

Because each agent sees only its local task:

Each local step appears compliant.
Global policy is violated in the aggregate.
No single actor is accountable.

This makes naive context passing incompatible with enterprise data governance.

Autonomy + Multi-Agent Coordination Makes This Worse

Security surveys emphasize that as agents gain autonomy — memory, planning, delegation — risk scales nonlinearly.¹⁹ ⁴

Key failure amplifiers:

Subgoal formation: agents reinterpret instructions, compounding earlier errors.⁴
Delegation chains: planner → worker → worker → tool, with no provenance tracking.¹⁹
Long-horizon contexts: agent memories persist malicious instructions.¹⁹ ⁴
Cross-agent contamination: one compromised agent corrupts downstream agents.

The Knight Institute's "levels of autonomy" framework and multi-agent safety surveys argue that conventional input/output safety checks fail in multi-agent pipelines because the boundary between "input," "policy," and "output" collapses.¹⁹ ⁴

Why Naive Context Sharing Is Structurally Unsafe

Bringing the research together, several structural dangers emerge:

1. Data vs. Instructions Blur Together

Models can't reliably distinguish quoted text from actionable instructions.¹⁰ ⁹

2. Provenance Is Lost Immediately

Downstream agents cannot tell whether context was:

user-provided
system-generated
adversarial
retrieved
injected by another agent¹¹ ³

3. Compositional Attacks Exploit Agent Boundaries

Benign local steps combine into dangerous global behavior.³ ⁴

4. Tool Access Amplifies Any Upstream Mistake

Poisoned context can cause API calls, writes, external network access.⁵ ⁶ ⁸

5. No One Has End-to-End Visibility

Logs are fragmented across agents, making forensics difficult.⁷ ³

This makes context passing not just a privacy issue — but a systems security problem.

Emerging Mitigations — and Why They Must Come Before A2A

Researchers are beginning to propose foundational defenses:

Context Provenance & Tagging

Prompt Infection research proposes metadata tags indicating trusted vs. untrusted content, helping downstream agents apply correct guardrails.² ¹¹

Principle of Least Context

Security analyses recommend only passing minimal required fields — not entire transcripts or documents.¹² ¹

Retrieval & KB Hardening

Detection of poisoning, index integrity checks, retrieval validation, and anomaly detection before context reaches agents.¹⁴ ¹⁵ ⁶ ⁵

Agent-Aware Threat Modeling

Systems must be modeled as multi-agent graphs, not single-agent loops.¹⁸ ³

Autonomy-Tiered Governance

Higher-autonomy agents require stricter monitoring, sandboxing, and human oversight.¹⁹ ⁴

The Core Thesis: Before We Can Do Safe A2A, We Must Fix Context

All research points to one unavoidable conclusion:

In multi-agent systems, context is the control surface. Whoever controls the context controls the agent.

Until we build:

provenance
classification
permission boundaries
per-user scopes
context minimization
cross-agent audit trails
end-to-end authorization checks

—we cannot deploy safe A2A communication at scale.

Enterprise agents won't fail because their models hallucinate.

They'll fail because they believed the wrong context from the wrong agent at the wrong time.

Fixing context passing is the prerequisite for safe agent ecosystems.

It is step zero in building real agent governance.