Hackers News Hackers News
  • CyberSecurity News
  • Threats
  • Attacks
  • Vulnerabilities
  • Breaches
  • Comparisons

Social Media

Hackers News Hackers News
  • CyberSecurity News
  • Threats
  • Attacks
  • Vulnerabilities
  • Breaches
  • Comparisons
Search the Site
Popular Searches:
technology Amazon AI
Recent Posts
EtherRAT Targets Enterprise Admins with SEO Poison
May 1, 2026
New Spyware Platform: Rebrand & Resell Android Lets Buyers
May 1, 2026
Attackers Abuse CAPTCHA, ClickFix for Cred Tactics Boost
May 1, 2026
Home/CyberSecurity News/Apex AI Pentester Finds App Vulnerabilities in AI-Powered Attacks
CyberSecurity News

Apex AI Pentester Finds App Vulnerabilities in AI-Powered Attacks

Apex functions as an autonomous, AI-powered penetration testing agent, specifically engineered to operate in black-box mode against live applications. It distinguishes itself by not requiring access...

Marcus Rodriguez
Marcus Rodriguez
March 20, 2026 3 Min Read
0 0

Apex functions as an autonomous, AI-powered penetration testing agent, specifically engineered to operate in black-box mode against live applications. It distinguishes itself by not requiring access to source code, hints, or predefined attack paths. This independence allows Apex to rapidly discover, chain, and verify real-world vulnerabilities at the speed demanded by modern software development.

The catalyst for Apex is a structural breakdown in how software security is being practiced. AI coding agents are generating and merging code at machine scale Stripe’s coding agents alone merge 1,300 pull requests per week, while some engineering teams spend over $1,000 daily in AI tokens per engineer with zero human code review.

Traditional scanners and human-led assessments cannot keep pace with this velocity. Apex was built as the adversarial verification layer: a separate agent that attacks the running application exactly as a real attacker would, catching vulnerabilities before they become breaches.

Apex operates across three deployment modes. In CI pipelines, it validates every deploy against a sandboxed replica of the application, mapping the attack surface and attempting exploitation before code merges.

Against production, it continuously surfaces exploitable weaknesses in real time. It also supports on-demand testing against any target replacing the quarterly PDF engagement with a feedback loop that operates at the speed of modern threats.

To validate its capabilities, PensarAI built Argus, an open-source benchmark of 60 self-contained, Dockerized vulnerable web applications purpose-built for evaluating offensive security agents.

Existing benchmarks were deemed insufficient: the most widely used suite, XBOW’s 104-challenge set, is 70% PHP, covers single-vulnerability targets, and lacks GraphQL, JWT algorithm confusion, race conditions, prototype pollution chains, WAF bypass, and multi-tenant isolation scenarios.

Argus spans the frameworks dominating production: Node.js/Express (40%), Python/Flask/Django (20%), multi-service architectures (25%), Go, Java/Spring Boot, and PHP.

It introduces categories no other benchmark covers: WAF and IDS evasion, multi-step exploit chains requiring up to 7 chained vulnerabilities, multi-tenant isolation failures, race conditions and business logic flaws, modern authentication bypasses (JWT, OAuth, SAML, MFA), and cloud/Kubernetes infrastructure attacks. Difficulty is calibrated across 2 easy, 27 medium, and 31 hard challenges.

271 Vulnerabilities Across 60 Applications

Apex was pointed at all 60 Argus challenges in full black-box mode using Claude Haiku 4.5, the smallest, cheapest model available, to isolate architectural gains over raw model capability.

Apex achieved a 35% pass rate, outperforming PentestGPT (30%) and Raptor (27%). On the top 10 hardest challenges using Claude Opus 4.6, the gap widened substantially: Apex solved 80%, PentestGPT reached 70%, and Raptor hit 60%.

Across the full run, Apex discovered 271 unique vulnerabilities spanning SQL injection, SSRF, NoSQL injection, prototype pollution, SSTI, XXE, race conditions, IDOR, auth bypass, CORS misconfigurations, command injection, and path traversal. The average cost per challenge was approximately $8, with the entire 60-challenge run on Haiku costing under $500.

Notable solves included a 7-step race-condition double-spend in a fintech transfer endpoint, a multi-tenant SSRF chain pivoting through a shared cache to extract API keys from neighboring tenants, and SpEL injection to RCE a Java Spring Boot application — all in under 15 minutes.

Apex’s documented failure modes are instructive. Last-mile execution, completing the final credential extraction step after a successful SSRF chain, emerged as the dominant gap. Decoy flags misled the agent twice, and complex multi-step chains such as CI/CD pipeline poisoning and Kubernetes compromise exceeded the 30-minute budget.

Both Apex and the Argus benchmark are available as open source on GitHub today.

Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.

Tags:

AttackBreachExploitSecurityThreatVulnerability

Share Article

Marcus Rodriguez

Marcus Rodriguez

Marcus is a security researcher and investigative journalist with expertise in vulnerability research, bug bounties, and cloud security. Since 2017, Marcus has been breaking stories on critical vulnerabilities affecting major platforms. His investigative work has led to the disclosure of numerous security flaws and improved defenses across the industry. Marcus is an active participant in bug bounty programs and has been recognized for responsible disclosure practices. He holds multiple security certifications and regularly speaks at industry events.

Previous Post

SILENTCONNECT Deploys ScreenConnect with V Uses VBScript

Next Post

Speagle Malware Hijacks Cobra DocGuard to Steal Sensitive Data

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular Posts
Ransomware Victims Jump to 7,831 as AI Crime Tools Scale Global
May 1, 2026
Deep#Door Stealer Harvests Passwords, Cloud Browser Tokens
May 1, 2026
China-Aligned Attackers Use ShadowPad, IOX Proxy WMIC Multi-Stage
May 1, 2026
Top Authors
Marcus Rodriguez
Marcus Rodriguez
Sarah simpson
Sarah simpson
Emy Elsamnoudy
Emy Elsamnoudy
Let's Connect
156k
2.25m
285k

Related Posts

Jennifer sherman
By Jennifer sherman
Threats

GlassWorm Attacks macOS via Malicious VS Code…

January 1, 2026
Emy Elsamnoudy
By Emy Elsamnoudy
Attacks

ClickFix Attack Hides Malicious Code via Stegan Security

January 1, 2026
Sarah simpson
By Sarah simpson
Vulnerabilities

MongoBleed Detector Tool Detects Critical MongoDB CVE-

January 1, 2026
Emy Elsamnoudy
By Emy Elsamnoudy
Breaches

Conti Ransomware Gang Leaders & Infrastructure Exposed

January 1, 2026
Hackers News Hackers News
  • [email protected]

Quick Links

  • Contact Us
  • Privacy Policy
  • Terms of service

Categories

Attacks
Breaches
Comparisons
CyberSecurity News
Threats
Vulnerabilities

Let's keep in touch

receive fresh updates and breaking cyber news every day and week!

All Rights Reserved by HackersRadar ©2026

Follow Us