LLMs killed the privacy star, we can't rewind, we've gone too far

(2026/02/26)

Reference: 1772064855
News link: https://www.theregister.co.uk/2026/02/26/llms_killed_privacy_star/
Source link:

Add privacy to the list of potential casualties caused by the proliferation of AI, because researchers have found that large language models (LLMs) can be used to deanonymize internet users – even those who use pseudonyms – more efficiently than human sleuths.

Much of the academic work on online privacy over the past 25 years builds upon [1]Latanya Sweeney 's 2002 research on [2]k-Anonymity [PDF], and [3]prior research in which she demonstrated it is possible to identify 87 percent of the US population using three anonymous data points – a five-digit ZIP code, gender, and date of birth.

The possibility of identifying people from anonymous data became one of the central concerns about online advertising and the usage of cookies in web browsers.

[4]

It's a risk that hasn't gone away and now appears to be even more grave, thanks to LLMs that can automate the process of connecting the dots across online posts so they point to a likely source.

[5]

[6]

"We show that LLM agents can figure out who you are from your anonymous online posts," said Simon Lermen, an AI engineer at MATS Research and one of the corresponding authors of a pre-press [7]paper titled "Large-scale online deanonymization with LLMs."

"Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates,” Lermen explained in an [8]online post .

[9]

The researcher observes that while it has long been known that individuals can be identified using only a few data points, doing so was often impractical. Such data often existed in an unstructured form and it took considerable effort for human investigators to assemble enough pieces to solve the identity puzzle.

LLMs accelerate and automate that process, and they do so affordably, Lermen and his co-authors claim.

"We demonstrate that large language models (LLMs) fundamentally change this calculus, enabling fully automated deanonymization attacks that operate on unstructured text at scale," they state in their paper. "Where previous approaches required predefined feature schemas, careful data alignment, and manual verification, LLMs can extract identity-relevant signals from arbitrary prose, efficiently search over millions of candidate profiles, and reason about whether two accounts belong to the same person."

[10]Bcachefs creator insists his custom LLM is female and 'fully conscious'

[11]AIs are happy to launch nukes in simulated combat scenarios

[12]Google catches Beijing spies using Sheets to spread espionage across 4 continents

[13]Hide from Meta's spyglasses with this new Android app

In one experiment, the authors collected 338 Hacker News users whose bios link to a LinkedIn profile. They did so to establish ground-truth identities for the study subjects so the LLMs’ predictions could be checked – this was also to avoid the ethical problems of actually deanonymizing people in a research study.

Next, they created a structured data profile of these users based on their comments and the stories they posted. Then they created a search prompt, anonymized it, and passed it to the AI agent. The agent went on to correctly identify 226 of the 338 targets, a success rate of 67 percent at 90 percent precision (there were 25 errant identifications and 86 abstentions where the model didn't offer a prediction).

[14]

The technique employed by the authors is not a universal privacy solvent – it's only successful some of the time. But it's successful often enough that those posting online under a pseudonymous account should not assume their identities will remain unknown.

It’s also cheap to run. The researchers report their entire experiment cost about $2,000, with the cost per profile estimated to be between $1 and $4.

Who would bother? The authors suggest that governments could use this technique to target journalists or activists, that corporations could mine forums to build highly targeted advertising profiles, and that online attackers could develop detailed personal profiles to make social engineering scams more credible.

Lermen argues that netizens therefore need to consider how each data point they share helps identify them.

"The combination is often a unique fingerprint," he said. "Ask yourself: could a team of smart investigators figure out who you are from your posts? If yes, LLM agents can likely do the same, and the cost of doing so is only going down."

Lermen’s co-authors are Daniel Paleka (ETH Zurich), Joshua Swanson (ETH Zurich), Michael Aerni (ETH Zurich), Nicholas Carlini (Anthropic), and Florian Tramèr (ETH Zurich). ®

Get our [15]Tech Resources

[1] https://latanyasweeney.org

[2] https://epic.org/wp-content/uploads/privacy/reidentification/Sweeney_Article.pdf

[3] https://kilthub.cmu.edu/articles/journal_contribution/Simple_Demographics_Often_Identify_People_Uniquely/6625769?file=12123218

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aZ_TdqCBdMEen3oeUojs8wAAAQc&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZ_TdqCBdMEen3oeUojs8wAAAQc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZ_TdqCBdMEen3oeUojs8wAAAQc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[7] https://arxiv.org/abs/2602.16800

[8] https://simonlermen.substack.com/p/large-scale-online-deanonymization

[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZ_TdqCBdMEen3oeUojs8wAAAQc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[10] https://www.theregister.com/2026/02/25/bcachefs_creator_ai/

[11] https://www.theregister.com/2026/02/25/ai_models_nuclear/

[12] https://www.theregister.com/2026/02/25/google_and_friends_disrupt_unc2814/

[13] https://www.theregister.com/2026/02/25/meta_smart_glasses_android_app/

[14] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZ_TdqCBdMEen3oeUojs8wAAAQc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[15] https://whitepapers.theregister.com/

This is right sucks (if true)

Anonymous Coward

Sounds like we're soon gonna need to run our postings through an anonymization LLM first, to ensure the deanonymization LLMs can't figure out who we truly are in the first/second place ...

But I'll bite: can you LLM-figure out my ElReg non-AC pseudonym from just this one post? Guesses welcomed below ... ;)

Re: This is right sucks (if true)

Claude Yeller

"can you LLM-figure out my ElReg non-AC pseudonym from just this one post?"

Not from one short post. It always depends on what you have written. It needs data points that link you to specific places and times.

But if you use a pseudonym and wrote comments about visiting a specific international conference or happening and the city you live in and where you got your education, an LLM might find all attendees from your city who attended your school/university.

That set of people might contain only you.

Your LinkedIn, Facebook, Instagram, and Xitter accounts generally contain all this information. If not, then the accounts of those who follow you might have it.

Look up some OSINT (Open Source Intelligence) stories to get a feeling what is all possible with just these four social media.

Then imagine what an LLM can do which scrapes them all.

Re: This is right sucks (if true)

frankyunderwood123

Zoinks it’s the gay blade!

News: 1772064855

LLMs killed the privacy star, we can't rewind, we've gone too far

This is right sucks (if true)

Re: This is right sucks (if true)

Re: This is right sucks (if true)