Judge Demands OpenAI Release 20M Million Anonymized
A federal judge in New York has ordered OpenAI to provide 20 million anonymized user logs from ChatGPT to plaintiffs in a major AI copyright lawsuit. This ruling upholds an earlier decision, issued...
A federal judge in New York has ordered OpenAI to provide 20 million anonymized user logs from ChatGPT to plaintiffs in a major AI copyright lawsuit. This ruling upholds an earlier decision, issued despite OpenAI’s stated privacy concerns.
District Judge Sidney H. Stein affirmed Magistrate Judge Ona T. Wang’s November order on January 5, 2026, in the Southern District of New York, ruling that privacy safeguards adequately balance the logs’ relevance to infringement claims.
OpenAI objected, arguing that the full dataset representing 0.5% of preserved logs was unduly burdensome and risked exposing user data, and instead proposed searching for conversations referencing plaintiffs’ works. Stein rejected this, noting that no case law mandates the “least burdensome” discovery method.
The saga began in July 2025 when news outlets, including The New York Times Co. and Chicago Tribune Co. LLC, sought 120 million logs to probe whether ChatGPT outputs infringed their copyrights by reproducing trained-on content.
OpenAI provided a dataset of 20 million samples, which the plaintiffs accepted but later declined full production, arguing that 99.99% of the logs were irrelevant.
Wang sided with the plaintiffs in November, denied OpenAI’s reconsideration in December, and Stein’s affirmation seals the deal under a protective order with de-identification protocols, Bloomberg stated.
OpenAI invoked a Second Circuit securities case that blocked SEC wiretap disclosures, but Stein sharply distinguished it: ChatGPT logs involve voluntary user inputs and undisputed company ownership, unlike surreptitious recordings. “Users’ privacy would be protected by the company’s exhaustive de-identification,” Wang had ruled earlier.
This ruling advances pretrial discovery in In re OpenAI, Inc. Copyright Infringement Litigation (No. 1:25-md-03143), consolidating 16 suits from news organizations, authors, and others alleging unauthorized use of works to train large language models.
It mirrors dozens of cases against AI firms like Microsoft and Meta, testing copyright’s application to generative tech amid debates over fair use and data scraping.
Plaintiffs argue the logs are vital to rebut OpenAI’s claims that they “hacked” responses for evidence and to assess infringement scope. OpenAI maintains anonymization and orders suffice, with no user privacy at risk.
The decision spotlights tensions between discovery proportionality and AI data hoards, potentially setting precedents for similar cases. Critics worry bulk log handovers could chill user trust in chatbots, while supporters see it as essential transparency.
OpenAI, represented by Keker Van Nest, Latham & Watkins, and Morrison & Foerster, faces production deadlines soon.
As AI litigation proliferates, this order underscores courts’ willingness to compel the production of expansive evidence, even anonymized, to probe training data practices. For content creators, it bolsters tools to challenge AI’s copyright encroachments; for tech giants, it signals rising scrutiny on user data vaults.
Dr. Kolochenko, CEO at ImmuniWeb, said to Cybersecuritynews that “For OpenAI, this decision is certainly a legal debacle, which will inspire other plaintiffs in similar cases to do the same to prevail in courts or to get much better settlements from AI companies.”
“This case is also a telling reminder that, regardless of your privacy settings, your interactions with AI chatbots and other systems may one day be produced in court. Architecture of modern LLMs and their underlying technology stack is very complex, so even if some user-facing systems are specifically configured to delete chat logs and history, some others may inevitably preserve them in one form or another. In some cases, produced evidence may trigger investigations and even criminal prosecution of AI users.”
Disclaimer: HackersRadar reports on cybersecurity threats and incidents for informational and awareness purposes only. We do not engage in hacking activities, data exfiltration, or the hosting or distribution of stolen or leaked information. All content is based on publicly available sources.



No Comment! Be the first one.