News: 0178917512

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Anthropic Will Start Training Its AI Models on Chat Transcripts (theverge.com)

(Thursday August 28, 2025 @05:22PM (msmash) from the PSA dept.)


Anthropic will [1]start training its AI models on user data , including new chat transcripts and coding sessions, unless users choose to opt out. The Verge:

> It's also extending its data retention policy to five years -- again, for users that don't choose to opt out. All users will have to make a decision by September 28th. For users that click "Accept" now, Anthropic will immediately begin training its models on their data and keeping said data for up to five years, according to a blog post published by Anthropic on Thursday.

>

> The setting applies to "new or resumed chats and coding sessions." Even if you do agree to Anthropic training its AI models on your data, it won't do so with previous chats or coding sessions that you haven't resumed. But if you do continue an old chat or coding session, all bets are off.



[1] https://www.theverge.com/anthropic/767507/anthropic-user-data-consumers-ai-models-training-privacy



Okay. (Score:3, Insightful)

by SmaryJerry ( 2759091 )

I figured this was the case already for all models.

Re: (Score:2)

by allo ( 1728082 )

It's rather suprising. Saving (e.g. for requests from authorities) is to be expected, but for training the user input will often be low quality. For code tasks this may be a different deal, but the typical dumb question with a few typos in it is not worth be end up in a training set. Usually the user writes a short low-quality text and then receives a large chunk from the AI in the typical chatlogs.

That's fine but... (Score:3)

by aldousd666 ( 640240 )

When users are writing their own code, they include secrets. Training on secrets in coding sessions is probably a terrible idea. Maybe they have some way to filter out secrets so they don't go in, but what if they miss something? This seems like a huge problem, at least, training on developer coding sessions. I don't really see a problem with doing it on the chats on the web.

Re: That's fine but... (Score:2)

by Tomahawk ( 1343 )

It's likely didn't stop them when it came to the original training, or subsequent training, so why should they care now? /s ?

Re: (Score:2)

by Himmy32 ( 650060 )

> I don't really see a problem with doing it on the chats on the web.

Except when it's people chatting about their health issues.

Secrets (Score:2)

by abulafia ( 7826 )

I don't see that as a huge problem - it isn't that hard to filter.

We run a hook that looks for secrets on push. It takes an admin to fix a false positive; that happens less than once a year. (We have a working population of about 800 engineers committing.)

Presumably OAI would care a lot less about false positives than we do (we don't want to throw away work product; OAI just wants masses of human output), so I expect they could err towards omission, not lose much on false positives and be pretty sure the

Re: (Score:2)

by VaccinesCauseAdults ( 7114361 )

> We have a working population of about 800 engineers committing.

And what is the total population of engineers? :)

Reminds me of the old joke: “Q. How many people work here? A. About half of them.”

Re: (Score:2)

by serafean ( 4896143 )

Or the other way around: I write GPL2+ code, I use AI. It's trained on my GPL2+ code. Now all of your code assisted by AI is derivative.

Good luck sorting this legal mess.

OpenRouter (Score:2)

by Himmy32 ( 650060 )

Not too often does adding a middleman outweigh the extra costs. But having some pretty straight forward settings that prevent using models that don't align to privacy preferences is pretty nice.

Delta: The kids will love our inflatable slides. -- David Letterman