Hackers Could Weaponize GGUF Models to Achieve RCE on SGLang
Researchers have uncovered a critical vulnerability in the SGLang inference server that could allow threat actors to execute arbitrary code. Tracked as CVE-2026-5760, the flaw enables hackers to...
Researchers have uncovered a critical vulnerability in the SGLang inference server that could allow threat actors to execute arbitrary code. Tracked as CVE-2026-5760, the flaw enables hackers to weaponize standard GGUF machine learning models, compromising the underlying servers that host them.
As enterprise artificial intelligence deployments grow, this discovery highlights the severe infrastructure risks posed by loading untrusted AI models from public repositories such as Hugging Face.
The root cause of this vulnerability lies in how SGLang processes conversational templates supplied by machine learning models.
Unsandboxed Template Rendering
Specifically, the flaw exists within the framework’s reranking endpoint, accessed via the /v1/rerank API path.
When SGLang renders these chat templates, the developers configured it to use a standard Jinja2 template engine via the environment() setting rather than a secure, sandboxed alternative.
Because the system fails to isolate or restrict the template rendering process, any Python script embedded in a model’s metadata will run automatically.
This oversight creates a textbook Server-Side Template Injection (SSTI) vulnerability, granting attackers full control over the AI inference server.
To exploit this vulnerability, an attacker does not need direct access to the target infrastructure or enterprise network.
Instead, they rely on deceiving a system administrator or an automated deployment pipeline into loading a poisoned model file.
According to a proof-of-concept exploit published by security researcher Stuub on GitHub, the attack unfolds in a highly predictable sequence:
- The attacker creates a malicious GGUF model that loads a Jinja2 payload into a manipulated chat template.
- The attacker embeds a specific trigger phrase to activate SGLang’s Qwen3 reranker detection system.
- An unsuspecting victim downloads and loads this compromised model into their SGLang environment.
- A user or application sends a standard prompt request to the vulnerable rerank endpoint.
- The server reads the poisoned chat template and executes the embedded Python payload directly on the host machine.
Payload Mechanics and Context
The malicious payload exploits a well-known Jinja2 escape technique to execute system commands.
By injecting an OS popen command via template variables, the code successfully breaks out of the application’s intended boundaries to run arbitrary operating system commands.
Once this happens, the threat actor achieves full Remote Code Execution (RCE) and can steal sensitive data, install malware, or pivot to other internal network resources.
This attack vector highlights a recurring problem in the artificial intelligence security landscape, sharing the same vulnerability class as the notorious “Llama Drama” flaw that previously affected similar libraries.
Security teams must rigorously audit their AI supply chains and deploy GGUF models only from verified sources to prevent catastrophic system compromise.
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.