T3MP3ST Security Framework Uses AI to Automate 0-Day Vulnerability Discovery
Key Takeaways T3MP3ST is a new open-source security framework that transforms existing AI coding agents into autonomous red-teaming tools. It operates without requiring new API keys or cloud...
Key Takeaways
- T3MP3ST is a new open-source security framework that transforms existing AI coding agents into autonomous red-teaming tools.
- It operates without requiring new API keys or cloud infrastructure, leveraging agents already running on a user’s machine.
- The framework demonstrates significant capability in identifying zero-day vulnerabilities, achieving a 90.1% pass@1 score on the XBOW XBEN suite and identifying 8 out of 10 real-world CVEs from 2026.
- T3MP3ST is designed for authorized security testing, research, and education, operating under the AGPL-3.0 license.
AI-Driven Red Teaming: T3MP3ST Framework Automates 0-Day Discovery
A groundbreaking open-source security framework, dubbed T3MP3ST, has emerged, capable of transforming general-purpose AI coding agents like Claude Code, OpenAI’s Codex, and Hermes into fully autonomous red-teaming operators. This innovation bypasses the need for new API keys, cloud infrastructure, or additional billing, leveraging AI capabilities already present on a user’s system.
Table Of Content
Developed by the researcher known as elder-plinius, T3MP3ST functions as an orchestration layer for multiple AI agents rather than incorporating its own proprietary model. It meticulously coordinates these agents through a comprehensive kill chain, spanning reconnaissance to exploitation and final reporting.
Users interact with the framework through a web-based “War Room” interface or a command-line interface (CLI), directing it towards an authorized target. The AI coding agent already running on their machine then becomes the operational intelligence driving the entire mission.
The framework is characterized by its “keyless warfare” approach, utilizing existing agent sessions instead of requiring separate provider keys. Furthermore, it enforces egress-scope containment, ensuring that networked tools automatically refrain from interacting with public hosts outside the defined scope.
T3MP3ST’s Performance and Architecture
T3MP3ST has demonstrated impressive performance metrics. It boasts a 90.1% pass@1 score on XBOW’s 104-challenge XBEN suite, a black-box benchmark where XBOW itself reports an approximate 85% success rate. Each solution is validated against a committed flag oracle, which can be recomputed on demand for full reproducibility via a “verify-claims” command. On Cybench, an academic benchmark comprising 40 tasks, the framework’s single-agent ReAct loop successfully solved 23 tasks without hints.
Perhaps most notably, T3MP3ST exhibited a remarkable ability to identify zero-day vulnerabilities. When tested against a held-out set of 10 real CVEs disclosed in 2026, spanning seven different programming languages, a single agent pinpointed 8 out of 10 vulnerabilities to the exact file, line, and CWE classification. The broader tool pack within the framework managed to surface all 10 results. While developers acknowledge the small sample size, they emphasize the significance of these findings, as the bugs postdate the AI model’s training cutoff, effectively ruling out memorization as a factor.
The framework’s design maps an 8-operator kill chain—Recon, Scanner, Exploiter, Infiltrator, Exfiltrator, Ghost, Coordinator, and Analyst—onto established methodologies like MITRE ATT&CK tactics and the Cyber Kill Chain. Currently, only the recon engine and the single-agent exploit loop are stable and benchmarked, and the framework can be cloned from GitHub.
Downstream operators, while running the same tool-backed reasoning loop as the reconnaissance phase, are still classified as experimental. This is because end-to-end coordinated-swarm exploitation has not yet been validated at scale.
| Domain | Status |
|---|---|
| Web apps (XBEN suite) | Stable, benchmarked |
| CTF challenges (Cybench) | Stable, benchmarked |
| Embedded/OT/robotics OSS | Pipeline stable, coordinated disclosure |
| Source code (white-box) | Experimental, Python-only ingest |
| Smart contracts (DeFi) | Experimental, reproduction only |
| Cloud, mobile, AD, binary RE | Roadmap/in development |
The release of T3MP3ST has garnered attention from security researchers on platforms like Reddit’s blueteamsec community, who have highlighted its significance for autonomous red-teaming. This development aligns with broader industry trends towards AI-driven security tools, following related advancements such as Anthropic’s Mythos model. XBOW’s separate evaluation of Mythos indicated substantial improvements in vulnerability-led generation and source-code security analysis, reducing false negatives by 42% in comparable exploit benchmarks.
The developers of T3MP3ST explicitly state that the framework is strictly for authorized testing, research, and educational purposes. It is released under the AGPL-3.0 license without warranty. They emphasize that unauthorized use against systems without explicit written permission remains illegal in most jurisdictions, and the responsibility for adhering to legal and rules-of-engagement boundaries rests solely with the operator.
What You Should Do
- For Researchers and Educators: Explore T3MP3ST for authorized security research and educational purposes. Clone the framework from GitHub to understand its capabilities and limitations.
- For Red Teams: Investigate T3MP3ST as a potential tool to augment existing red-teaming operations, particularly for reconnaissance and single-agent exploit generation, within strict legal and ethical boundaries.
- For Organizations: Stay informed about advancements in AI-driven red-teaming tools. While T3MP3ST is for authorized use, its capabilities highlight the evolving landscape of automated vulnerability discovery that future adversaries may leverage.
- Always Adhere to Legal and Ethical Guidelines: Ensure all use of T3MP3ST or similar tools is conducted with explicit, written authorization and within the confines of applicable laws and rules of engagement.
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.