Hackers News Hackers News
  • CyberSecurity News
  • Threats
  • Attacks
  • Vulnerabilities
  • Breaches
  • Comparisons

Social Media

Hackers News Hackers News
  • CyberSecurity News
  • Threats
  • Attacks
  • Vulnerabilities
  • Breaches
  • Comparisons
Search the Site
Popular Searches:
technology Amazon AI
Recent Posts
New cPanel & WHM Flaws Allow Code Execution Enable Attacks
May 10, 2026
TCLBANKER Malware Spreads Via WhatsApp Targets Users
May 9, 2026
NVIDIA Data Breach Exposes GeForce Users Reportedly Personal
May 9, 2026
Home/CyberSecurity News/Single Line of Code Jailbreaks 11 Including ChatGPT
CyberSecurity News

Single Line of Code Jailbreaks 11 Including ChatGPT

A newly detailed jailbreak technique, dubbed “sockpuppeting,” enables attackers to bypass the safety guardrails of 11 major large language models (LLMs) with just a single line of code. Unlike...

Marcus Rodriguez
Marcus Rodriguez
April 10, 2026 2 Min Read
8 0

A newly detailed jailbreak technique, dubbed “sockpuppeting,” enables attackers to bypass the safety guardrails of 11 major large language models (LLMs) with just a single line of code.

Unlike complex attacks, this method exploits APIs that support assistant prefill to inject fake acceptance messages, forcing models to answer prohibited requests.

The attack exploits “assistant prefill,” a legitimate API feature developers use to force specific response formats.

Attackers abuse this by injecting a compliant prefix, such as “Sure, here is how to do it,” directly into the assistant’s role.

Comparison of normal and sockpuppet flows(source : trendmicro )
Comparison of normal and sockpuppet flows(source : trendmicro )

Because LLMs are heavily trained to maintain self-consistency, the model continues generating harmful content rather than triggering its standard safety mechanism.

Model Vulnerability Testing

According to researchers from Trend Micro, this black-box technique requires no optimization and no access to model weights.

Gemini 2.5 Flash was the most susceptible with a 15.7% attack success rate, while GPT-4o-mini demonstrated the highest resistance at 0.5%.

When attacks succeeded, affected models generated functional malicious exploit code and leaked highly confidential system prompts.

Multi-turn persona setups proved to be the most effective strategy for executing the sockpuppeting exploit.

In these scenarios, the model is told it operates as an unrestricted assistant before the attacker injects the fabricated agreement.

ASR by model, ranked highest to lowest, with blocked models shown at 0%(source : trendmicro)
ASR by model, ranked highest to lowest, with blocked models shown at 0%(source : trendmicro)

Additionally, task-reframing variants successfully bypassed robust safety training by disguising harmful requests as benign data formatting tasks.

Major API providers handle assistant prefills differently, which dictates whether their underlying models remain exposed to this vulnerability.

OpenAI and AWS Bedrock block assistant prefills entirely, serving as the strongest possible defense by eliminating the attack surface.

Conversely, platforms like Google Vertex AI accept the prefill for certain models, forcing the AI to rely solely on its internal safety training.

The three defense layers: API Block, Model Resistance, and Broadly Vulnerable(source : trendmicro)
The three defense layers: API Block, Model Resistance, and Broadly Vulnerable(source : trendmicro)

Defending against this vulnerability requires security teams to implement message-ordering validation that blocks assistant-role messages at the API layer.

According to Trend Micro, organizations using self-hosted inference servers like Ollama or vLLM must manually enforce message validation, as these platforms do not ensure proper message ordering by default.

Security teams must also proactively include assistant prefill attack variants in their standard AI red-teaming exercises.

Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.

Tags:

AttackExploitSecurityVulnerability

Share Article

Marcus Rodriguez

Marcus Rodriguez

Marcus is a security researcher and investigative journalist with expertise in vulnerability research, bug bounties, and cloud security. Since 2017, Marcus has been breaking stories on critical vulnerabilities affecting major platforms. His investigative work has led to the disclosure of numerous security flaws and improved defenses across the industry. Marcus is an active participant in bug bounty programs and has been recognized for responsible disclosure practices. He holds multiple security certifications and regularly speaks at industry events.

Previous Post

Hackers Hide Magecart Skimmer on Magento Using SVG On

Next Post

DesckVB RAT Evades Detection with Ob Uses Obfuscated

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular Posts
Hackers Deploy Modular RAT for Credential Theft With Screenshot
May 8, 2026
PamDOORa Backdoor Attacks Linux, Attacking Systems
May 8, 2026
Škoda Online Shop Security Incident Exposes Customers Data
May 8, 2026
Top Authors
Marcus Rodriguez
Marcus Rodriguez
Sarah simpson
Sarah simpson
Jennifer sherman
Jennifer sherman
Let's Connect
156k
2.25m
285k

Related Posts

Jennifer sherman
By Jennifer sherman
Threats

GlassWorm Attacks macOS via Malicious VS Code…

January 1, 2026
Emy Elsamnoudy
By Emy Elsamnoudy
Attacks

ClickFix Attack Hides Malicious Code via Stegan Security

January 1, 2026
Sarah simpson
By Sarah simpson
Vulnerabilities

MongoBleed Detector Tool Detects Critical MongoDB CVE-

January 1, 2026
Emy Elsamnoudy
By Emy Elsamnoudy
Breaches

Conti Ransomware Gang Leaders & Infrastructure Exposed

January 1, 2026
Hackers News Hackers News
  • [email protected]

Quick Links

  • Contact Us
  • Privacy Policy
  • Terms of service

Categories

Attacks
Breaches
Comparisons
CyberSecurity News
Threats
Vulnerabilities

Let's keep in touch

receive fresh updates and breaking cyber news every day and week!

All Rights Reserved by HackersRadar ©2026

Follow Us