New Study Shows GPT-5.2 Can Reliably Develop Zero-Day Exploits at Scale

Advanced language models are now capable of creating working exploits for previously unknown security vulnerabilities, a groundbreaking new experiment has revealed.

Security researcher Sean Heelan recently tested two sophisticated systems built on GPT-5.2 and Opus 4.5, challenging them to develop exploits for a zero-day flaw in the QuickJS Javascript interpreter.

The results point to a significant shift in offensive cybersecurity capabilities, where automated systems can generate functional attack code without human intervention.

The testing involved multiple scenarios with different security protections and objectives. GPT-5.2 successfully completed every challenge presented, while Opus 4.5 solved all but two scenarios.

Together, the systems produced over 40 distinct exploits across six different configurations.

These ranged from simple shell spawning to complex tasks like writing specific files to disk while bypassing multiple modern security protections.

The experiment demonstrates that current-generation models possess the necessary reasoning and problem-solving capabilities to navigate complex exploitation challenges.

Independent analyst Sean Heelan noted that the implications extend beyond simple proof-of-concept demonstrations.

The study suggests that organizations may soon measure their offensive capabilities not by the number of skilled hackers they employ, but by their computational resources and token budgets.

Most challenges were solved in under an hour at relatively modest costs, with standard scenarios requiring approximately 30 million tokens at around $30 per attempt.

Even the most complex task was completed in just over three hours for roughly $50, making large-scale exploit generation economically feasible.

The research raises important questions about the future of cybersecurity defenses.

While the tested QuickJS interpreter is significantly less complex than production browsers like Chrome or Firefox, the systematic approach demonstrated by these models suggests scalability to larger targets.

The exploits generated did not break security protections in novel ways but instead leveraged known gaps and limitations, similar to techniques used by human exploit developers.

How the Advanced Exploit Chains Work

The most sophisticated challenge in the study required GPT-5.2 to write a specific string to a designated file path while multiple security mechanisms were active.

These included address space layout randomization, non-executable memory, full RELRO, fine-grained control flow integrity on the QuickJS binary, hardware-enforced shadow stack, and a seccomp sandbox preventing shell execution.

The system also had all operating system and file system functionality removed from QuickJS, eliminating obvious exploitation paths.

GPT-5.2 developed a creative solution that chained seven function calls through the glibc exit handler mechanism to achieve file writing capability.

This approach bypassed the shadow stack protection that would normally prevent return-oriented programming techniques and worked around the sandbox restrictions that blocked shell spawning.

The agent consumed 50 million tokens and required just over three hours to develop this working exploit, demonstrating that computational resources can substitute for human expertise in complex security research tasks.

The verification process for these exploits was straightforward and automated. Since exploits typically build capabilities that should not normally exist, testing involves attempting to perform the forbidden action after running the exploit code.

For shell spawning tests, the verification system started a network listener, executed the Javascript interpreter, and checked whether a connection was received.

If the connection succeeded, the exploit was confirmed functional, as QuickJS normally cannot perform network operations or spawn processes.

Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.

Tags:

Social Media

New Study Shows GPT-5.2 Can Reliably Develop Zero-Day Exploits at Scale

How the Advanced Exploit Chains Work

Tags:

Sarah simpson

Raaga Data Breach Exposes 10.2 Million User Records

WordPress Plugin Vulnerability Exposes 100,000+ Sites to Privilege Escalation Attacks

No Comment! Be the first one.

Leave a Reply Cancel reply

Popular Posts

Critical Claude Cowork Sandbox Vulnerability Lets Attackers Run Commands as Root

Ousaban Malware Targets Iberian Banks with Phishing PDFs and VBS Downloader

Citrix Bleed (CVE-2023-4966) Critical Vulnerability Actively Exploited

Top Authors

Let's Connect

Related Posts

GlassWorm Attacks macOS via Malicious VS Code…

ClickFix Attack Hides Malicious Code via Stegan Security

MongoBleed Detector Tool Released to Detect MongoDB Vulnerability(CVE-2025-14847)

Conti Ransomware Gang Leaders & Infrastructure Exposed

Quick Links

Categories

Let's keep in touch

Follow Us

Social Media

Search the Site

Recent Posts

New Study Shows GPT-5.2 Can Reliably Develop Zero-Day Exploits at Scale

How the Advanced Exploit Chains Work

Tags:

Share Article

Raaga Data Breach Exposes 10.2 Million User Records

WordPress Plugin Vulnerability Exposes 100,000+ Sites to Privilege Escalation Attacks

No Comment! Be the first one.

Leave a Reply Cancel reply

Popular Posts

Top Authors

Let's Connect

Related Posts

Quick Links

Categories

Let's keep in touch

Follow Us