Single Line of Code Jailbreaks 11 Including ChatGPT
A newly detailed jailbreak technique, dubbed “sockpuppeting,” enables attackers to bypass the safety guardrails of 11 major large language models (LLMs) with just a single line of code. Unlike...
A newly detailed jailbreak technique, dubbed “sockpuppeting,” enables attackers to bypass the safety guardrails of 11 major large language models (LLMs) with just a single line of code.
Unlike complex attacks, this method exploits APIs that support assistant prefill to inject fake acceptance messages, forcing models to answer prohibited requests.
The attack exploits “assistant prefill,” a legitimate API feature developers use to force specific response formats.
Attackers abuse this by injecting a compliant prefix, such as “Sure, here is how to do it,” directly into the assistant’s role.

Because LLMs are heavily trained to maintain self-consistency, the model continues generating harmful content rather than triggering its standard safety mechanism.
Model Vulnerability Testing
According to researchers from Trend Micro, this black-box technique requires no optimization and no access to model weights.
Gemini 2.5 Flash was the most susceptible with a 15.7% attack success rate, while GPT-4o-mini demonstrated the highest resistance at 0.5%.
When attacks succeeded, affected models generated functional malicious exploit code and leaked highly confidential system prompts.
Multi-turn persona setups proved to be the most effective strategy for executing the sockpuppeting exploit.
In these scenarios, the model is told it operates as an unrestricted assistant before the attacker injects the fabricated agreement.

Additionally, task-reframing variants successfully bypassed robust safety training by disguising harmful requests as benign data formatting tasks.
Major API providers handle assistant prefills differently, which dictates whether their underlying models remain exposed to this vulnerability.
OpenAI and AWS Bedrock block assistant prefills entirely, serving as the strongest possible defense by eliminating the attack surface.
Conversely, platforms like Google Vertex AI accept the prefill for certain models, forcing the AI to rely solely on its internal safety training.

Defending against this vulnerability requires security teams to implement message-ordering validation that blocks assistant-role messages at the API layer.
According to Trend Micro, organizations using self-hosted inference servers like Ollama or vLLM must manually enforce message validation, as these platforms do not ensure proper message ordering by default.
Security teams must also proactively include assistant prefill attack variants in their standard AI red-teaming exercises.
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.