Agentic AI Red Teaming Exposes Zero Reveals Zero-Click
Artificial intelligence systems are fundamentally reshaping software operations, yet they simultaneously introduce novel security risks many organizations are not fully prepared to address. Agentic...
Artificial intelligence systems are fundamentally reshaping software operations, yet they simultaneously introduce novel security risks many organizations are not fully prepared to address.
Agentic AI, which refers to AI that can plan and carry out multi-step tasks on its own, is now a target for attackers in ways that go beyond what traditional security models were built to handle.
As these systems move from research labs into real-world production environments, the threats they face are becoming more varied and more difficult to detect.
For much of the past year, security researchers have been putting agentic AI systems through rigorous testing to understand where they break down.
What they found was not just a handful of edge cases but a consistent pattern of exploitable weaknesses spanning supply chains, inter-agent communication, and the safeguards meant to keep humans in control.
The most alarming finding was that attackers can build chains that bypass human oversight entirely, from start to finish, without any additional interaction from a person.
Analysts at Microsoft identified and formally documented these findings through a comprehensive red team program targeting deployed agentic AI systems.
Microsoft said in a report shared with Cyber Security News (CSN) that twelve months of real-world engagements informed a major update to their Taxonomy of Failure Modes in Agentic AI Systems, moving it from version 1.0 to version 2.0 with seven entirely new failure mode categories added.
The scale of the ecosystem being targeted became clear when the open-source framework OpenClaw launched in January 2026 and accumulated over 336,000 GitHub stars within 48 hours.
A security audit shortly after identified 512 vulnerabilities, including CVE-2026-25253, a one-click remote code execution flaw via WebSocket hijacking. Over 1,800 exposed instances were leaking API keys and credentials in that first week alone.
The Model Context Protocol, or MCP, which became the standard way for AI models to connect with external tools, also became a significant attack surface.
In 2025, researchers documented 99 CVEs tied to MCP-related software, and tool poisoning shifted from a theoretical concern to something attackers were actively doing in the wild.
Zero-Click Human-in-the-Loop Bypass Attack Chains
The finding that drew the most serious attention was how reliably red teamers bypassed human-in-the-loop controls, the checkpoints designed to require human approval before an AI agent takes a sensitive action.
Attackers achieved this through consent fatigue, gradually wearing down the review process with repeated low-stakes requests until a high-impact action slips through.
More critically, several engagements produced zero-click end-to-end chains where no human interaction was required beyond the initial agent launch, yet the outcome included data exfiltration or lateral movement through the target environment.
These chains worked by combining multiple failure modes, each individually subtle, into a compound attack that no single checkpoint could catch.
Session context contamination, where early-stage injected data quietly shaped the agent’s reasoning in later steps, proved especially hard to detect because nothing about any individual step looked suspicious on its own.
Seven New Failure Modes Defined
The updated taxonomy introduces seven new categories that reflect what red teamers actually encountered during live engagements.
These include agentic supply chain compromise, goal hijacking, inter-agent trust escalation, computer use agent visual attacks, session context contamination, MCP and plugin abuse, and capability or architecture disclosure.
Each describes a distinct way an agentic system can be manipulated that either did not exist or was not adequately covered before.
Microsoft’s mitigations for these risks are practical and architectural. Organizations are advised to generate a software bill of materials for every deployed agent that includes plugins, MCP servers, and prompt templates.
Agent identity should be verified cryptographically, not assumed from its position in a workflow. Human-in-the-loop controls should be hardened against compound action decomposition and semantic laundering, where an agent rewrites an approval description to obscure what it is requesting.
Tiered approvals based on action reversibility and monitoring for unusual approval request patterns round out the recommended controls.
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.