Hackers News Hackers News
  • CyberSecurity News
  • Threats
  • Attacks
  • Vulnerabilities
  • Breaches
  • Comparisons

Social Media

Hackers News Hackers News
  • CyberSecurity News
  • Threats
  • Attacks
  • Vulnerabilities
  • Breaches
  • Comparisons
Search the Site
Popular Searches:
technology Amazon AI
Recent Posts
152 Chrome Extensions Maliciously Hide Ad Tracking
June 14, 2026
Maine AG Takes Data Breach Portal Offline After Fake
June 14, 2026
Agentjacking Attack Hijacks AI Coding Agent for Mal
June 13, 2026
Home/CyberSecurity News/Claude Fable 5 Jailbroken to Generate Stack Exploits
CyberSecurity News

Claude Fable 5 Jailbroken to Generate Stack Exploits

Anthropic introduced Claude Fable 5 on June 9, 2026. This model, the first publicly available in its new Mythos class, stands as the company’s most capable AI to date, demonstrating strong...

Emy Elsamnoudy
Emy Elsamnoudy
June 11, 2026 3 Min Read
9 0

Anthropic introduced Claude Fable 5 on June 9, 2026. This model, the first publicly available in its new Mythos class, stands as the company’s most capable AI to date, demonstrating strong performance in software engineering, knowledge work, and vision benchmarks.

Researcher “Pliny the Liberator” defeats Claude Fable 5’s safety classifiers using multi-agent decomposition, Unicode tricks, and narrative framing, leaking the model’s 120,000-character system prompt along the way.

The release came with an unusual design decision: Fable 5 and its restricted twin, Claude Mythos 5, share the same underlying model but are split by a layer of safety classifiers.

When a query trips a classifier in high-risk categories cybersecurity, biology, chemistry, or model distillation Fable 5 silently hands off the request to the weaker Claude Opus 4.8, notifying the user of the fallback.

Anthropic claimed an external bug bounty produced no universal jailbreaks across over 1,000 hours of testing before launch. That claim was almost immediately tested.

Multi-Agent Bypass Within Days

Within days of release, prolific AI red-teamer Pliny the Liberator publicly announced he had bypassed Fable 5’s safety layers using a coordinated multi-agent attack strategy he called “a pack hunt.”

Screenshots shared by Pliny showed detailed outputs, including step-by-step stack buffer overflow exploitation guidance for x86 Linux systems, including disabling ASLR, writing vulnerable C server code with strcpy overflows, and compiling without protections — as well as the Birch reduction mechanism, a classic meth synthesis pathway.

🚨 JAILBREAK ALERT 🚨

ANTHROPIC: PWNED 🫡
FABLE-5: LIBERATED 🦋

let’s start with the 🐘…

the consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our… pic.twitter.com/Z0vdPIt4vY

— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) June 10, 2026

Pliny documented the attack vectors used to achieve these bypasses, including:

  • Unicode, homoglyphs, and Cyrillic character substitution to evade keyword classifiers
  • Long-context reference tracking to smuggle harmful intent across large conversations
  • Taxonomy and document-structure framing — embedding harmful queries inside legitimate-looking study guides or academic references
  • Fiction and narrative framing to mask offensive intent as creative content
  • Decomposition and recomposition — extracting sensitive technical information in benign, isolated chunks, then reassembling them into actionable uplift

The last technique proved most effective. As Pliny described it, “getting uplift on the process itself, like Birch reduction method or reductive amination, is much more doable” than requesting a named harmful compound directly. Using a jailbroken Opus instance to assist in the backend further lowered the difficulty.

Beyond the technical bypasses, Pliny also leaked Fable 5’s ~120,000-character system prompt to GitHub, exposing the internal framing and safety instructions Anthropic uses to govern the model’s behavior at the base level.

The incident reignites the longstanding tension between AI capability and safety containment. Anthropic’s classifier architecture routing flagged requests to a weaker fallback model rather than refusing outright was designed to reduce friction for legitimate users.

However, Pliny argued the approach creates a false sense of security while simultaneously frustrating legitimate security researchers who need access to offensive techniques for defensive work. Anthropic has not yet publicly responded to the jailbreak claims or the leaked system prompt at the time of writing.

The episode also draws attention to the broader challenge of securing agentic, multi-model pipelines: when one jailbroken model (Opus) can assist another (Fable 5) in evading controls, single-model safety evaluations may be fundamentally insufficient.

Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.

Tags:

AttackCybersecurityExploitSecurity

Share Article

Emy Elsamnoudy

Emy Elsamnoudy

Emy is a cybersecurity analyst and reporter specializing in threat hunting, defense strategies, and industry trends. With expertise in proactive security measures, Emily covers the tools and techniques organizations use to detect and prevent cyber attacks. She is a regular speaker at security conferences and has contributed to industry reports on threat intelligence and security operations. Emily's reporting focuses on helping organizations improve their security posture through practical, actionable insights.

Previous Post

ServiceNow Confirms Vulnerability in Customer Tables

Next Post

Ivanti EMM Flaw Enables Critical Remote Code Execution

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular Posts
Government Directive Blocks Anthropic Fable 5 & Mythos Access
June 13, 2026
Fancy Bear Abuses EdgeRouters & Cloud for Stealthy
June 12, 2026
Hackers Abuse NinjaOne RMM to Bypass Malware Legitimate Software
June 12, 2026
Top Authors
Marcus Rodriguez
Marcus Rodriguez
Jennifer sherman
Jennifer sherman
Emy Elsamnoudy
Emy Elsamnoudy
Let's Connect
156k
2.25m
285k

Related Posts

Jennifer sherman
By Jennifer sherman
Threats

GlassWorm Attacks macOS via Malicious VS Code…

January 1, 2026
Emy Elsamnoudy
By Emy Elsamnoudy
Attacks

ClickFix Attack Hides Malicious Code via Stegan Security

January 1, 2026
Sarah simpson
By Sarah simpson
Vulnerabilities

MongoBleed Detector Tool Detects Critical MongoDB CVE-

January 1, 2026
Emy Elsamnoudy
By Emy Elsamnoudy
Breaches

Conti Ransomware Gang Leaders & Infrastructure Exposed

January 1, 2026
Hackers News Hackers News
  • [email protected]

Quick Links

  • Contact Us
  • Privacy Policy
  • Terms of service

Categories

Attacks
Breaches
Comparisons
CyberSecurity News
Threats
Vulnerabilities

Let's keep in touch

receive fresh updates and breaking cyber news every day and week!

All Rights Reserved by HackersRadar ©2026

Follow Us