
AI Security
Why naive context sharing breaks multi-agent systems — and why securing A2A must come first.
Multi-agent architectures are rapidly becoming the default pattern for building capable AI systems. Agents now collaborate, delegate, critique, plan, retrieve, and execute across complex workflows. But beneath this excitement lies a structural weakness few practitioners fully appreciate:
Passing context between agents may be the single most dangerous design choice in modern AI systems — because it silently carries instructions, data, and state across boundaries that were never meant to be crossed.
Research across prompt injection, RAG safety, autonomy, and multi-agent systems reveals a consistent theme: naive context sharing creates an attack surface that is far larger and far harder to secure than anything seen in single-agent environments.1 2 3 4
This problem must be solved before A2A (agent-to-agent) interaction can be deployed safely inside enterprises.
In multi-agent systems, "context" often includes much more than conversation transcripts. Modern frameworks routinely pass rich, structured state between agents — including:4 1
Each of these channels can silently carry:
Worse: no single entity (agent, orchestrator, or developer) has full visibility into what's being shared across the system.3 1
In other words: the moment agents start exchanging context, you've created a distributed trust network with no shared notion of provenance, authorization, or data classification.
Classic prompt injection research distinguishes:
But multi-agent workflows introduce a third, far more dangerous category:
Prompt infection: malicious instructions that replicate across agents like a virus.9 10 11
Lehmann et al. formally document how prompt infections spread across LLM agents by embedding malicious instructions into intermediate messages, which downstream agents treat as trusted system-level guidance.11 2 A single compromised agent can:
Because agents pass context wholesale — full plans, full transcripts, full RAG outputs — an injected instruction can "hitchhike" through the system unnoticed.11 3
This is exponentially harder to detect than single-agent prompt injection.
RAG (Retrieval-Augmented Generation) amplifies the dangers of context passing because retrieved text is often treated as trusted evidence, not as an instruction channel.
Research shows attackers can poison RAG pipelines with extremely small insertion rates:
In multi-agent systems, this becomes even more dangerous:
One agent retrieves a poisoned document; another agent executes the malicious instruction inside it.3 12
Attribution becomes difficult because the agent that retrieves is not the agent that acts. This enables attacks where:
No single agent sees the full chain.
Toolformer-style architectures blur the line between text generation and action execution.16 17 Once agents can call APIs, web search, or browsing tools, context-based attacks escalate from "bad text" to real data movement.
Recent work shows that attackers can exploit context sharing to orchestrate multi-step exfiltration flows:8 7 4
Because each agent only sees its local state, and logs are fragmented:7 3
This kind of compositional attack is impossible in single-agent systems — it only emerges when context hops across multiple specialized agents.
A major risk documented in cross-domain multi-agent systems research is context bypass — where individually harmless pieces of data combine into a policy violation.3 1
For example:
Because each agent sees only its local task:
This makes naive context passing incompatible with enterprise data governance.
Security surveys emphasize that as agents gain autonomy — memory, planning, delegation — risk scales nonlinearly.19 4
Key failure amplifiers:
The Knight Institute's "levels of autonomy" framework and multi-agent safety surveys argue that conventional input/output safety checks fail in multi-agent pipelines because the boundary between "input," "policy," and "output" collapses.19 4
Bringing the research together, several structural dangers emerge:
Models can't reliably distinguish quoted text from actionable instructions.10 9
Downstream agents cannot tell whether context was:
Benign local steps combine into dangerous global behavior.3 4
Poisoned context can cause API calls, writes, external network access.5 6 8
Logs are fragmented across agents, making forensics difficult.7 3
This makes context passing not just a privacy issue — but a systems security problem.
Researchers are beginning to propose foundational defenses:
Prompt Infection research proposes metadata tags indicating trusted vs. untrusted content, helping downstream agents apply correct guardrails.2 11
Security analyses recommend only passing minimal required fields — not entire transcripts or documents.12 1
Detection of poisoning, index integrity checks, retrieval validation, and anomaly detection before context reaches agents.14 15 6 5
Systems must be modeled as multi-agent graphs, not single-agent loops.18 3
Higher-autonomy agents require stricter monitoring, sandboxing, and human oversight.19 4
All research points to one unavoidable conclusion:
In multi-agent systems, context is the control surface. Whoever controls the context controls the agent.
Until we build:
—we cannot deploy safe A2A communication at scale.
Enterprise agents won't fail because their models hallucinate.
They'll fail because they believed the wrong context from the wrong agent at the wrong time.
Fixing context passing is the prerequisite for safe agent ecosystems.
It is step zero in building real agent governance.