Microsoft Deletes Blog Telling Users To Train AI on Pirated Harry Potter Books (arstechnica.com)
The blog, written in November 2024 by senior product manager Pooja Kamath, walked users through building Q&A systems and generating fan fiction using the copyrighted texts, and even included a Microsoft-branded AI image of Harry Potter. The Kaggle dataset's uploader, data scientist Shubham Maindola, told Ars Technica the public domain label was "a mistake" and deleted the dataset after the outlet reached out.
[1] https://arstechnica.com/tech-policy/2026/02/microsoft-removes-guide-on-how-to-train-llms-on-pirated-harry-potter-books/