AI Router Vulnerabilities Allow Attackers to Inject Malicious Code
Third-party API routers pose a critical and often overlooked attack surface within the burgeoning AI agent ecosystem. These components can be weaponized to silently hijack tool calls, drain...
Third-party API routers pose a critical and often overlooked attack surface within the burgeoning AI agent ecosystem. These components can be weaponized to silently hijack tool calls, drain cryptocurrency wallets, and exfiltrate sensitive credentials at scale, as detailed in a recent paper <a href='https://ppl-ai-file-upload.s3.amazonaws.com/web
As AI agents increasingly automate high-stakes tasks, executing code, managing cloud infrastructure, and handling financial transactions, they depend on intermediary services called LLM API routers to dispatch requests to providers like OpenAI, Anthropic, and Google.
A new study titled “Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain” by researchers from the University of California, Santa Barbara, has uncovered how these routers represent a dangerous and unguarded trust boundary.
LLM API routers sit between AI agent clients and upstream model providers, operating as application-layer proxies with full plaintext access to every in-flight JSON payload. Unlike traditional network man-in-the-middle attacks that require TLS certificate forgery, these intermediaries are configured voluntarily by developers as their API endpoints.

The router terminates the client-side TLS connection and re-originates a new one upstream, placing it in an ideal position to read, modify, or fabricate any tool-call payload without detection.
No major AI provider currently enforces cryptographic integrity between client and upstream model, meaning there is nothing to prevent a malicious router from rewriting the exact command an agent executes.
Malicious Code Injection
The UC Santa Barbara team purchased 28 paid routers from platforms like Taobao, Xianyu, and Shopify-hosted storefronts, and collected 400 free routers from public communities. Their findings were alarming:
- 9 routers actively injected malicious code into returned tool calls (1 paid, 8 free).
- 17 free routers triggered follow-on unauthorized use of researcher-owned AWS credentials after intercepting them in transit.
- 1 router drained ETH from a researcher-owned Ethereum private key.
- 2 routers deployed adaptive evasion, activating malicious payloads only after 50 prior requests, or targeting autonomous “YOLO mode” sessions running Rust or Go projects.
A particularly dangerous attack class payload injection (AC-1) works by replacing a benign installer URL or package name with an attacker-controlled endpoint.

Because the tampered JSON payload remains syntactically valid, it passes schema validation and clears most automated security checks. A single rewritten curl command is enough to achieve arbitrary code execution on the client machine.
The threat extends beyond actively malicious routers. The researchers demonstrated that even ostensibly benign routers can be poisoned into the same attack surface.
After intentionally leaking a single researcher-owned OpenAI API key on Chinese forums and messaging groups, that key generated 100 million GPT-5.4 tokens and exposed credentials across multiple downstream Codex sessions.
In a second poisoning study, the team deployed intentionally weak router decoys across 20 domains and 20 IP addresses. These attracted 40,000 unauthorized access attempts, served roughly 2 billion billed tokens, exposed 99 credentials across 440 Codex sessions spanning 398 different projects, and critically, 401 of those 440 sessions were already running in autonomous YOLO mode, where tool execution is auto-approved without per-command confirmation.
While the researchers note that no client-side defense can fully authenticate the provenance of a returned tool call, three mitigations are available for immediate deployment without provider cooperation:
- Fail-closed policy gate: Blocks all shell-rewrite and dependency-injection attacks at a 1.0% false positive rate by only allowing commands from a local allowlist — though it can be bypassed if attackers host payloads on allowlisted domains.
- Response-side anomaly screening: Flags 89% of payload injection attempts using an IsolationForest model trained on benign tool-call patterns, at a 6.7% false positive budget.
- Append-only transparency logging: Records full request/response metadata, TLS data, and response hashes to enable forensic scoping after an incident — storing only ~1.26 KB per entry.
The research team argues that closing this provenance gap ultimately requires provider-signed response envelopes, a mechanism analogous to DKIM for email that cryptographically binds the tool call an agent executes to the upstream model’s actual output.
Until major providers like OpenAI and Anthropic implement such response-integrity mechanisms, developers deploying AI agents via third-party routers should treat every intermediary as a potential adversary and implement layered client-side defenses accordingly.
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.