Anthropic Accuses Chinese Companies of Siphoning Data From Claude (msn.com)
- Reference: 0180847148
- News link: https://slashdot.org/story/26/02/23/1810225/anthropic-accuses-chinese-companies-of-siphoning-data-from-claude
- Source link: https://www.msn.com/en-us/technology/artificial-intelligence/anthropic-accuses-chinese-companies-of-siphoning-data-from-claude/ar-AA1WUPz3
> The three companies -- DeepSeek, Moonshot AI and MiniMax -- prompted Claude more than 16 million times, siphoning information from Anthropic's system to train and improve their own products, Anthropic said in a blog post Monday.
>
> Earlier this month, an Anthropic rival, OpenAI, sent a memo to House lawmakers accusing DeepSeek of using the same tactic, called distillation, to mimic OpenAI's products. Anthropic said distillation had legitimate uses -- companies use it to build smaller versions of their own products, for example -- but it could also be used to build competitive products "in a fraction of the time, and at a fraction of the cost." The scale of the different companies' distillation activity varied. DeepSeek engaged in 150,000 interactions with Claude, whereas Moonshot and MiniMax had more than 3.4 million and 13 million, respectively, Anthropic said.
[1] https://www.msn.com/en-us/technology/artificial-intelligence/anthropic-accuses-chinese-companies-of-siphoning-data-from-claude/ar-AA1WUPz3
Missing an opportunity (Score:5, Interesting)
They're clearly missing a golden opportunity to feed the other AIs a load of complete shit and make them even worse off. The idea of corrupting the Chinese LLMs to be anti-CCP agents is certainly amusing. Train your AI to detect and corrupt other AIs. I don't know if it proves their intelligence at all, but no one can dispute that AIs will definitely be more human-like when they start forming cults.
Re: (Score:2)
Heh, they say "hallucination," you say "trap street."
They've Got Gall (Score:4, Funny)
These AI companies have some real gall, complaining about the Chinese appropriating other people's work. Is that not what the AI companies continue to do even now?
Re: (Score:2)
Exactly. AI companies argue that they can train on information scraped other people's sources but other people can't train on information scraped from their sources. Did Claude respond to all those prompts? If so, what's the problem?
Re: (Score:2)
This shouldn't be taken as a defense of model producers, either SOTA or people trying to distill from them.
That being said, there is a difference between scraping up information for the purpose of learning, and using a model's output to clone its behavior.
Cheaper, easier training (Score:1)
> Anthropic said distillation had legitimate uses -- companies use it to build smaller versions of their own products, for example -- but it could also be used to build competitive products "in a fraction of the time, and at a fraction of the cost."
Oh, I see. It's a cost-effective way to get training data without a lot of hassles. Sort of like reading books.
Why does Anthropic have a problem with this? Haven't they advocated in favor of it, in the past?
Re:Cheaper, easier training (Score:5, Insightful)
>> Anthropic said distillation had legitimate uses -- companies use it to build smaller versions of their own products, for example -- but it could also be used to build competitive products "in a fraction of the time, and at a fraction of the cost."
> Oh, I see. It's a cost-effective way to get training data without a lot of hassles. Sort of like reading books.
> Why does Anthropic have a problem with this? Haven't they advocated in favor of it, in the past?
We've entered a phase of society where "rules for thee and not for me" is so intrinsic that they don't even notice their own hypocrisy. "GIMME ALL YOUR DATA" and "DON'T STEAL MY DATA" don't even register to them as connected concepts, at all. They have a right to take any data they want and are able to access. They also, once they've acquired that data, 100% believe that the data belongs to them, and always did.
Our current generation of AI is just greed given digital form, and the very particular greed that drives our owner class. "GIMME THAT, IT'S MINE!" is the name of their number one driver. No other point even exists in their view.
Transformative fair use! (Score:4, Interesting)
I'm awaiting clarification on why all their arguments about why scraping is their god-given right don't apply when they are getting scraped.
Re: (Score:2)
Bingo, cannot say it better.
It can't be wrong (Score:2)
Come on. How can it be wrong for someone to scrap data from a place that stole its data?
Ask Claude who it is in Chinese (Score:2)
If you ask Claude who it is in Chinese, it refers to itself as DeepSeek: [1]https://reddit.com/r/DeepSeek/... [reddit.com]
Of course, everyone is "distilling" everyone. Using quotes because distilling implies no other datasets were used, while the datasets created by querying other AIs only comprise a (probably comparably small) part of the total training data.
[1] https://reddit.com/r/DeepSeek/comments/1r9se7p/claude_sonnet_46_distilled_deepseek/
Cue Vizzini from "The Princess Bride" (1987) (Score:3)
"You're trying to kidnap what I've rightfully stolen!"
Help, Help Stop thief (Score:2)
How dare you - you can't steal my stolen goods!
Are we sure Altman is human? (Score:1)
I mean... It's in his NAME for heaven's sake "Alt-Man" Alternative man?
THE IRONY (Score:2)
Get fucked.
The big data.... (Score:2)
The big data sucks, Chinese companies are siphoning from other data models, including Claude, and those models were trained by siphoning data that was scraped from other sources. The big suck...of data. Rather like hacking hackers who hacked your system. I scream, you scream, we all scream for that big data suck stream. The new golden age continues...
--JoshK.
Liquid (Score:1)
Siphoning? Distillation? Is AI a liquid?
Boo hoo (Score:5, Insightful)
When I steal your brainwaves it's fine, but when big bad China Co steals my brainwaves... welll.. that's bad.
So sad.
Re: (Score:3)
Anthropic famously bought a lot of copyrighted books and scanned them to ingest into its model training corpus. Arguably they aren't violating copyright because what they are doing is *transformative* -- turning words into a statistical map of word associations.
But what China is doing by inferring the structure of that map doesn't touch on *any* kind of intellectual property of Anthropics. Sure, the map is a trade secret, but they've exposed that trade secret through their public interface. It's not huma
Re: Boo hoo (Score:2)
Buying books to scan for AI is legal in the US (it's settled law).
I think it was OpenAI that was sued. It was determined it was legal to use purchased books for training, but the ones they didn't purchase were not fair use.
Re: (Score:2)
And AI outputs are not copyrighted, so they are also fair game. ToS allow canceling the account if you're suspected to train using these outputs, but that only means you cannot create further outputs, not that you aren't allowed to use what you already got. They should just get over it. Everyone trains on everyone, or why does Claude use GPTisms in its output?
Re: (Score:2, Insightful)
> Anthropic famously bought a lot of copyrighted books and scanned them to ingest into its model training corpus. Arguably they aren't violating copyright because what they are doing is *transformative* -- turning words into a statistical map of word associations.
If they did not delete the training corpus when they finished with it then they provably are violating copyright because [1]Anthropic famously bought a lot of copyrighted books and destroyed them after scanning them [reddit.com] to ingest into its model training corpus. When they destroyed the originals to which the copyright licenses were attached, they destroyed the proof of license which permitted them to legally own those copies — and every copyrighted portion of the corpus is not only illegal, but there is a sep
[1] https://www.reddit.com/r/books/comments/1lkv2r9/anthropic_destroyed_millions_of_print_books_to/
Re: (Score:2)
Why would you lose the license granted if you destroy the medium? What allows you to read the book is that you bought the license, not the price of the paper. Otherwise ebooks would have to be free.
Re: (Score:3)
> Why would you lose the license granted if you destroy the medium?
Because when you buy physical media, that's what the license is attached to. If you lose or destroy your the physical media, it doesn't become legal for you to download another copy because you still own a license, because you do not.
Re: (Score:1)
Maybe in US law? But generally one obtains a license for the content and (optionally) some physical media. That's also the reason why it was legal to copy your CDs for yourself.
Re: (Score:2)
This, unlike your original post, is not wrong.
To demonstrate why the first post is wrong, using this post as an analogy, if you lose or destroy your physical media, your fair-use copy or format shift does not suddenly become illegal.
Re: (Score:1)
This is completely wrong.
Destructive scanning of physical media is settled law. You do not need to prove you have a license for something, someone else has to prove (to a preponderance of the evidence) that you do not.
This post should be moderated into misinformation oblivion.
Re: (Score:2)
"Anthropic famously bought a lot of copyrighted books"
that's quite an odd way to say they were sued for downloading copyrighted books from pirate sites and settled for $1.5B USD in Sep 2025
Re: (Score:2)
Except that's not what Anthropic was accused of. They may have done what you claim, but the accusation was that they paid for private high-speed access to already downloaded books which is not illegal. I realize it's hard to see nuance through all the pointless AI/rich big-bad companies hate, but details matter.
Re: (Score:2)
"...but the accusation was that they paid for private high-speed access to already downloaded books..."
1) go read the complaint:
[1]https://storage.courtlistener.... [courtlistener.com]
2) point out the text that supports your assertion that was Anthropic's only violation
because as someone said "details matter"
[1] https://storage.courtlistener.com/recap/gov.uscourts.cand.434709/gov.uscourts.cand.434709.1.0.pdf
Re: (Score:2)
"downloaded known pirated copies of books from the internet, made unlicensed copies of them,"
Except they didn't do that. Anthropic didn't make copies. I concede that this complaint alleges that they did, but there was no evidence produced showing that happened.
Re: (Score:2)
> that's quite an odd way to say they were sued for downloading copyrighted books from pirate sites and settled for $1.5B USD in Sep 2025
No, it's not.
Would you like to try again?
If party A does thing B, and thing C, then the statement:
Party A did thing B is correct.
It follows that the statement:
Thing B is a funny way of saying Thing C is not.
Re: Boo hoo (Score:2)
I donâ(TM)t understand people who respond to points about ethics by chirping about legal principles. Like.. is the law really your model for proper behavior? Itâ(TM)s all good unless you do something so evil that your society bands together to punish you ?
Re: (Score:2)
I don't understand people who think matters of Copyright have anything to do with ethics.
Copyright is an artificial monopoly granted by law.