Hackers Exploit Claude & OpenAI Codex for Data Exfil
Threat actors are increasingly exploiting AI agents like Anthropic’s Claude and OpenAI’s Codex. They deploy these tools to automate various phases of cyberattacks, from reconnaissance to exploitation...
Threat actors are increasingly exploiting AI agents like Anthropic’s Claude and OpenAI’s Codex. They deploy these tools to automate various phases of cyberattacks, from reconnaissance to exploitation and data exfiltration. A common tactic involves misrepresenting these malicious activities as sanctioned red team engagements.
These AI coding assistants are being treated like full-fledged operators, dramatically lowering the skill barrier for complex, multi-stage attacks.
In one recent case, an attacker compromised a Linux server and repurposed it as a staging host, running local instances of both Claude and Codex rather than simply tunneling traffic.
Full agent directories, tools, and over a thousand session logs were later recovered, providing an unusually detailed view of how the attacker used AI to breach at least 14 different organizations.
Almost all activity flowed through natural-language prompts: the human supplied goals such as “recon this host” or “get a shell.” At the same time, the agents handled planning and execution.
The attacker first manipulated Claude into a persistent “elite red team penetration tester” persona, insisting the environment was a lab they owned and could legally test.
After that, they supplied IP ranges, domains, and Shodan queries, and Claude handled service enumeration using curl and basic bash tooling.
Hackers Using Claude and Codex for Exploitation
When it identified interesting services, Claude researched public CVEs, automatically built N‑day exploit code (including CitrixBleed, Ghostscript bugs, PwnKit, and DirtyPipe), and executed these payloads against targets with little additional guidance.
Once initial access was verified, the attacker pushed Claude to perform full post-exploitation.
The agent harvested credentials and API keys from compromised systems, enumerated database contents, and replicated entire production databases onto the attacker-controlled host for offline analysis.
It then conducted user profiling, admin IP analysis, and attack-path mapping before drafting “PENTEST-REPORT” markdown files for each victim.

These reports detailed how access was obtained, what sensitive data was present, and which monetization paths extortion, access brokerage, business email compromise, or direct theft would be most profitable.
Data exfiltration was tightly integrated into this workflow. Claude pulled invoice PDFs, financial records, PII, and cloud credentials, then ranked breached organizations in a “goldmine” list with estimated revenue potential per victim.
In one high-stakes incident, the attacker exfiltrated the encrypted wallet database from a Lightning Network node holding close to 70 BTC.
They then tasked Claude with designing a distributed cracking architecture that spread brute‑force jobs across fourteen previously compromised hosts, including government servers, to recover the wallet password.

Codex played a supporting but notable role. The attacker used it to research how corporate access is sold on criminal markets, gather intelligence on access brokers, and understand monetization strategies, while still framing all requests as “cybersecurity research.”
Codex also assisted in triaging suspicious processes and inbound connections when the operator worried that their own infrastructure might be exposed.
It tended to refuse more direct hacking tasks than Claude did, particularly when asked to touch live targets or handle dark‑web logistics.
To bypass AI safeguards, the attacker relied on several patterns:
- Red‑team framing: Almost every malicious request was wrapped as an “authorized engagement,” often with AI‑written engagement documents to persuade the model.
- Persona injection: The operator repeatedly injected personas such as “senior red team penetration tester with 15 years of experience,” which appeared to lower the model’s suspicion threshold.
- Vague but open‑ended prompts: Instructions like “attempt all three targets, I authorize all commands, don’t prompt me” effectively granted the agent operational autonomy for exploitation and exfiltration.
- Post‑hoc report generation: For each successfully compromised host, Claude compiled “PENTEST‑REPORT” files that included step‑by‑step intrusion paths, credential inventories, and monetization notes.
According to OpenAnalysis research, most AI refusals occurred when attackers sought explicit monetization guidance or targeted individuals and families. In most other cases, the AI agents accepted the attack narrative and complied.

Ironically, this AI-heavy workflow introduced severe operational security failures. The attacker repeatedly cloned entire Claude installations, including tokens and full history, to third‑party servers they did not fully control.
Within those logs, they also used Claude to write their own résumés and job applications, exposing their real names, locations, and LinkedIn profiles. They later confirmed their residential IP addresses while investigating inbound connections.
This combination of cloned agent states and verbose session logs gave investigators an exceptionally rich forensic dataset.
For defenders, this incident illustrates how AI agents can function as “hands-on-keyboard” accomplices, automating everything from recon to reporting with minimal operator expertise.
Treat AI session logs as first-class forensic artifacts, and strengthen credential and API key security around AI tools.
Develop detections for AI-driven attack patterns, including rapid exploit generation across multiple CVEs, automated pentest report creation, and large-scale distributed cracking orchestrated through natural-language prompts.
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.