Nvidia Contacted Anna's Archive To Secure Access To Millions of Pirated Books (torrentfreak.com)
- Reference: 0180616952
- News link: https://yro.slashdot.org/story/26/01/19/2257241/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books
- Source link: https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/
> NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an [1]expanded class-action lawsuit that cites internal NVIDIA documents, several book authors [2]claim (PDF) that the trillion-dollar company directly reached out to Anna's Archive, [3]seeking high-speed access to the shadow library data . [...] Last Friday, the authors filed an amended complaint that significantly expands the scope of the lawsuit. In addition to adding more books, authors, and AI models, it also includes broader "shadow library" claims and allegations. The authors, including Abdi Nazemian, now cite various internal Nvidia emails and documents, suggesting that the company willingly downloaded millions of copyrighted books. The new complaint alleges that "competitive pressures drove NVIDIA to piracy," which allegedly included collaborating with the controversial Anna's Archive library.
>
> According to the amended complaint, a member of Nvidia's data strategy team reached out to Anna's Archive to find out what the pirate library could offer the trillion-dollar company "Desperate for books, NVIDIA contacted Anna's Archive -- the largest and most brazen of the remaining shadow libraries -- about acquiring its millions of pirated materials and 'including Anna's Archive in pre-training data for our LLMs,'" the complaint notes. "Because Anna's Archive charged tens of thousands of dollars for 'high-speed access' to its pirated collections [] NVIDIA sought to find out what "high-speed access" to the data would look like."
>
> According to the complaint, Anna's Archive then warned Nvidia that its library was illegally acquired and maintained. Because the site previously wasted time on other AI companies, the pirate library asked NVIDIA executives if they had internal permission to move forward. This permission was allegedly granted within a week, after which Anna's Archive provided the chip giant with access to its pirated books. "Within a week of contacting Anna's Archive, and days after being warned by Anna's Archive of the illegal nature of their collections, NVIDIA management gave 'the green light' to proceed with the piracy. Anna's Archive offered NVIDIA millions of pirated copyrighted books." The complaint states that Anna's Archive promised to provide NVIDIA with access to roughly 500 terabytes of data. This included millions of books that are usually only accessible through Internet Archive's digital lending system, which itself has been targeted in court. The complaint does not explicitly mention whether NVIDIA ended up paying Anna's Archive for access to the data.
>
> Additionally, it's worth mentioning that NVIDIA also stands accused of using other pirated sources. In addition to the previously included Books3 database, the new complaint also alleges that the company downloaded books from LibGen, Sci-Hub, and Z-Library. In addition to downloading and using pirated books for its own AI training, the authors allege NVIDIA distributed scripts and tools that allowed its corporate customers to automatically download " [4]The Pile ", which contains the Books3 pirated dataset.
[1] https://yro.slashdot.org/story/24/05/28/2157242/nvidia-denies-pirate-e-book-sites-are-shadow-libraries-to-shut-down-lawsuit
[2] https://torrentfreak.com/images/naznvid-amend.pdf
[3] https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/
[4] https://en.wikipedia.org/wiki/The_Pile_(dataset)
It's interesting to see billionaires being cheap. (Score:2)
Nvidia can probably afford to buy the entire publishing industry at this point but they choose to go dumpster diving for books, lol.
Re: (Score:1)
its their legal duty to get the best value option for their shareholders . its cheaper to buy the law after commiting the crime.
Re: (Score:2)
The shareholders vote for the leadership and the leadership is responsible for hiring and firing, as well as all other management tasks like propagating corporate ideals and strategy. Where is this legal obligation to maximize value at all costs?
Piracy (Score:1)
Information wants to be free. Not that LLMs aren't transformative and exactly as involved in copyright violation as any human reading the work. (and therefore being able to quote parts of it) Copyright is there to encourage creation, not to reward an artist's grandchildren or buy a rent-seeking executive his second yacht. People complaining about this have absolutely no sympathy from me.
Doesn't make sense (Score:4, Informative)
This doesn't make sense. They just need to download the data from the torrents. You don't have to ask Anna's Archive for this. It might take a number of days, but surely they would want their own local copies and not try and access it "on demand". Also, [1]Meta is already known to have downloaded their data a year ago [torrentfreak.com]
[1] https://torrentfreak.com/meta-torrented-over-81-tb-of-data-through-annas-archive-despite-few-seeders-250206/
Re: (Score:2)
FTS; NVIDIA sought to find out what "high-speed access" to the data would look like. I imagine it looks different from leeching 80TB of torrents.
A.I companies leech off other peoples works (Score:2)
Who knew A.I companies leech off other peoples works :o
USA or China for AI future? (Score:2)
We basically have two choices here. We either back the copyright folks and punish AI for theft. If we do this, we hand China the AI race because China will ignore copyright laws and whatcha gonna do about it? Right, nothing.
The other choice is we rule that AI "consuming" any content is perfectly acceptable and we position ourselves for the best chance to win the AI race.
Honestly, either way works for me. Copyright has been extended repeatedly over the years to an absurd length. I'd love to see that change.
Re: (Score:2)
You basically have two choices here. Either you take the gun and shoot yourself in the foot, or you let your Chinese girlfriend take her gun and shoot herself in the foot first. Even if you don't shoot yourself, she just might just do it first, and you'll be second, whatcha gonna do then? Right, nothing.
Honestly, either way works for me. You should shoot yourself in the foot first, or not shoot yourself and let her shoot herself first. Warnings have been extended repeatedly over the years to an absurd len
Precedent for individuals (Score:2)
So if NVIDIA settles a class act at whatever cents per book, this then should be relevant for any copyright claims brought against individuals that perform copyright infringement AND are profiting from that infringement?
Asking for no-one in particular, just curious.
How will investors get paid? (Score:2)
For every $1 these AI companies earn in revenue I can see them being successfully sued for $10 or more. Even if the US congress gives them some sort of immunity in the USA they are still going to be sued for a non trivial fraction of the world GDP in the rest of the world.
This is what class actions are for. (Score:2)
Class actions allow the defending company to lump everyone together and pay out a relatively small amount (yes I know that's not what they are supposed to do, but that's the end result in the real world). Oh, and those investors you are worried about already sold out and moved their profits into the next shiny thing. I guess you mean the remaining suckers?
The only surprise here is that its a hardware company, with presumably a lot of revenue per debt as opposed to the 'AI software' companies hoarding tha