The Editors Protecting Wikipedia from AI Hoaxes (404media.co)

(Friday October 11, 2024 @05:30PM (msmash) from the how-about-that dept.)

Reference: 0175233859
News link: https://news.slashdot.org/story/24/10/11/1554202/the-editors-protecting-wikipedia-from-ai-hoaxes
Source link: https://www.404media.co/the-editors-protecting-wikipedia-from-ai-hoaxes/

A group of Wikipedia editors have [1]formed WikiProject AI Cleanup , "a collaboration to combat the increasing problem of unsourced, poorly-written AI-generated content on Wikipedia." From a report:

> The group's goal is to protect one of the world's largest repositories of information from the same kind of misleading AI-generated information that has plagued Google search results, books sold on Amazon, and academic journals. "A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," Ilyas Lebleu, a founding member of WikiProject AI Cleanup, told me in an email. "Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques."

>

> In many cases, WikiProject AI Cleanup finds AI-generated content on Wikipedia with the same methods others have used to find AI-generated content in scientific journals and Google Books, namely by searching for phrases commonly used by ChatGPT. One egregious example is this Wikipedia article about the Chester Mental Health Center, which in November of 2023 included the phrase "As of my last knowledge update in January 2022," referring to the last time the large language model was updated.

[1] https://www.404media.co/the-editors-protecting-wikipedia-from-ai-hoaxes/

Plausibly (Score:1)

by i kan reed ( 749298 )

One could use their change-sets to fuel an AI that spots AIs.

Or for the more nefariously minded, train an LLM that rewrites LLM trash to sound less like LLM trash.

Re: (Score:2)

by allo ( 1728082 )

"Or for the more nefariously minded, train an LLM that rewrites LLM trash to sound less like LLM trash."

That is the same idea like "Why don't we train an AI to detect good AI images and throw away all the bad ones".

If you could do this, the generator AIs would already have it integrated (in fact you could integrate such a tool into the training so you do not even need to generate the bad images afterward).

A LLM that can rewrite LLM trash to sound better can be used as alternative to the LLMs that write tras

Re: (Score:1)

by i kan reed ( 749298 )

> That is the same idea like "Why don't we train an AI to detect good AI images and throw away all the bad ones".

That's called a generative adversarial network and it's the foundation for a lot of the modern generative tech.

But also I do mean something more specific. "Less detectable to people trying to do detection" at the cost of "Less like real writing" is a thing a lot of people(spammers) want from LLMs.

Re: (Score:2)

by allo ( 1728082 )

That's called a generative adversarial network

That's exactly the point. If you have the classifier, you use it to train the generator, so you don't need to classifier afterward.

But also I do mean something more specific. "Less detectable to people trying to do detection" at the cost of "Less like real writing" is a thing a lot of people(spammers) want from LLMs.

Less detectable is more or less the same goal as having text that reads fluently. The things you can use to detect LLM content are exactly

Re: (Score:1)

by i kan reed ( 749298 )

It's not because, if you look, the project in TFA uses metrics and key words and other triggers to flag their targets, not just "is it gibberish/wrong" which is a standard all of wikipedia nominally tries to enforce on all edits, ai or not.

Re: (Score:2)

by Samantha Wright ( 1324923 )

Remarkably close to this, Wikipedia was promoting [1]a tool [mozilla.org] based on an LLM that allows users to add statements from new sources, although in this case the caveat is that the user has to find a citable source first.

[1] https://addons.mozilla.org/en-US/firefox/addon/wikipedia-add-a-fact/

Not AI (Score:4, Insightful)

by Geoffrey.landis ( 926948 )

I wish people would stop referring to large language models as "Artificial Intelligence."

It is sophisticated pattern-matching software. It doesn't in any way "know" what the text it produces means; it just makes text that looks like the patterns of text that are made by humans who do.

Re: (Score:2)

by DesertNomad ( 885798 )

^^^^THIS^^^^

The misnamed "AI" output is a mashup of whatever else mentions the same salient terms. It has not to me shown any evidence of being thought out, It shows no evidence of original content or analysis of concept. It ingests everything, including dross, and doesn't seem to be able to tell the difference in an analytical way..

There do seem to be those who believe in the "turtles all the way down" concept of LLMs that analyze other LLM outputs, and perform useful functions like editing for clarity, or

Re: (Score:2)

by Darinbob ( 1142669 )

The LLMs analyzing output from other LLMs tends to go off the rails with much more "hallucination" effects. Garbage in, garbage out. Even output that is intended to closely resemble natural language has enough garbage in it to screw up the learning. You take a mimeograph of a mimeograph of a mimeograph all the way down and you end up with something nasty.

(for those who don't know what mimeographs are, they're a messy organic solution allowing documents to reproduce. Also get off my lawn!)

However GPT4 ha

Re: (Score:2)

by Geoffrey.landis ( 926948 )

> Everyone understands what is meant. We're not all on the spectrum like you.

Hey! Us guys on the spectrum invented, like, transistors & stuff! iPhones! Chia pets! If it weren't for us, there wouldn't even be any large language models.

This again? (Score:2)

by dinfinity ( 2300094 )

Stop. It's not going to change and you don't look smart by saying this.

You're not the arbiter of this. You can't even properly define "Intelligence".

Stop polluting Slashdot with this utterly uninteresting and unending pointless 'insight".

Re: (Score:2)

by allo ( 1728082 )

And stop saying crypto when they mean blockchain-based currencies.

Don't fight it, we already lost. Neuronal networks are now AI, because the term is catchier than all alternatives. Try to get people to say ML ...

Re: (Score:3)

by MobyDisk ( 75490 )

Scary fact: That's how you we operate as well. You and I don't know what we are going to say 5 words from now: it just flows out of a complicated statistical model in our brains. Sometimes we say things then realize they are wrong only after we hear ourselves say it. And sometimes we still don't know.

Decades ago, a program that played Chess was considered "Artificial Intelligence." Then we moved the goal posts to "Well, they can't beat a grand master." (Most of us can't!) Then we felt safe behind the

Re: (Score:2)

by Darinbob ( 1142669 )

Except that Artificial Intelligence has been the term since the 70s, despite there being no Intelligence. It is a research field to investigate and try to have someting resembling intelligence but without explicitly programming it all. Machine learning, or machine adaptation.

Re: Not AI (Score:3)

by ahoffer0 ( 1372847 )

Artificial intelligence is a term that's been around for a long time. It's not going away anytime soon, but the goal posts keep moving. The A* traversal algorithm was considered artificially intelligent. The threshold for artificial intelligence used to be a program that could be the world champion at chess. At one time artificial intelligence was synonymous with computer vision. If you could make software that recognized and identified objects, you had artificial intelligence.

Re: (Score:2)

by Local ID10T ( 790134 )

> I wish people would stop referring to large language models as "Artificial Intelligence."

Artificial Intelligence is a field of scientific endeavor. Like Mathematics.

LLMs are a part of artificial intelligence. Like addition is a part of mathematics.

It should not be confused with the entire field... but it is a part of it. Therefore the term is applicable -even if wildly misleading as commonly used.

Re: (Score:2)

by Samantha Wright ( 1324923 )

The term you're looking for is "strong AI" or "artificial general intelligence" (AGI.) As others have pointed out, the term "artificial intelligence" has always referred to all forms of research into automated decision-making. Perhaps Hollywood has misled plebs on this point by bombarding the public with portrayals of human-level thinking by machines, but bringing that viewpoint into a community like Slashdot is only ever going to get you shouted down.

They will lose this fight (Score:1)

by Baron_Yam ( 643147 )

It is easier to destroy than to create, and with AI generating the poison content there is no long term scenario where the legitimate content wins out that doesn't involve forcing contributors to register with verified government ID and paid admins policing them.

Re: (Score:2)

by Samare ( 2779329 )

Current LLM content doesn't include valid sources, so it can be removed immediately per Wikipedia's guidelines. [1]https://en.wikipedia.org/wiki/... [wikipedia.org]

And pages can already be set to show the latest content validated by contributors with more experience. Maybe that will become more common. [2]https://en.wikipedia.org/wiki/... [wikipedia.org]

[1] https://en.wikipedia.org/wiki/Wikipedia:Content_removal#Unsourced_information

[2] https://en.wikipedia.org/wiki/Wikipedia:Pending_changes

Re:They will lose this fight (Score:4, Insightful)

by allo ( 1728082 )

LLM cannot give valid sources. But systems involving LLM can. See for example Perplexity, what integrates a websearch with a LLM so it can source its text. This may or may not result in useful texts, but it provides valid sources to look up if the text is correct.

Re: (Score:2)

by Darinbob ( 1142669 )

Generally those using chat AI outputs do so to summarize writing that is already there. So one could write up a bad wikipedia article, with valid citations, then have the LLM "clean it up". Especially useful for non-native speakers of the language.

Re: They will lose this fight (Score:2)

by dpille ( 547949 )

I'm not sure I see *any* long-term scenario where it isn't all "AI" and the humans are grateful for it. How many times do you have to have your painstaking entry deleted for relevance, only to have some stub emerge and get worked on years later? How often do you want your observation that the sky is blue because there's no "source"? Do you enjoy fighting battles you'll never win between reality and politics?

Obviously, I have an opinion: Wikipedia sucks because of its crusading editors, and I could care les

buy wikipedia? (Score:2)

by doc1623 ( 7109263 )

I don't know the solution but Wikipedia seems to have become an indispensable asset to the free world. Could an international consortium buy Wikipedia with a clear and un-editable mission statement? Something like the E.U. but hopefully, more countries including the U.S. Not government specifically, but one that governments support and participate in?

I can't imagine the consequences if Wikipedia just disappeared or become completely biased. They might be hard to quantify, but I have no doubt they would be

Re: (Score:1)

by dawg1234 ( 6925868 )

Russkies are working on their own wikipedia with blackjack and hookers. It will be completely unbiased. 100% true and 110% on the right side of history.

Re: (Score:2)

by dsgrntlxmply ( 610492 )

The vodka is strong but the meat is feeble.

Re: (Score:3)

by Darinbob ( 1142669 )

Ah, like Conservapedia, chock full of falsehoods that present a deliberately biased view. Starts with the assumption that existing wikipedia is liberally biased (or biased in other ways opposed to Conservapedia authors), and existing print encyclopedias are also highly biased, therefore balance the scales by biasing in the opposite way. The result is a wiki encyclopedia firmly in favor of one, and only one, style of creationism forming a sizeable bulk of the content, along with conspiracy theories, diatri

Re: (Score:3)

by Samare ( 2779329 )

Wikipedia is a project of Wikimedia Foundation. And Wikimedia Foundation isn't a corporation, it's an American 501(c)(3) nonprofit organization. [1]https://en.wikipedia.org/wiki/... [wikipedia.org]

[1] https://en.wikipedia.org/wiki/Wikimedia_Foundation

Re: (Score:2)

by doc1623 ( 7109263 )

I understand it's a non-profit, but that makes it vulnerable to it's big donors just like our democracy is, doesn't it? Also, from the ads/request, it doesn't seem to get enough funding as it is. Maybe an just an international consortium to fund it, as long as it remains, basically, unbiased?

Re: (Score:3)

by Darinbob ( 1142669 )

Ah, but "international" means un-American :-) Some people will assume it must be biased. Effectively there's no way to avoid bias or the appearance of bias. The best you can do is be open and clear with everything, and Wikipedia mostly does that already.

Re: (Score:2)

by Samare ( 2779329 )

While American democracy is paid for by big donors who give money in exchange for favorable laws and decisions, Wikipedia is written by people like you and me who don't care about the big donors.

Re: (Score:2)

by RossCWilliams ( 5513152 )

> Wikimedia Foundation isn't a corporation, it's an American 501(c)(3) nonprofit organization.

A 401(c) is by definition a non-profit corporation. They are funded by contributions, which reminds me I need to send them some money.

I am not sure it matters whether content is generated by AI or some other process. The issue is whether an entry is factually accurate, truthful and not spun to someones interest. Its pretty obvious that is not the case for a lot of entries now. Because Wikipedia is the first goto for many of us, there are lots of people trying to influence the content.

There was recently a

Re: (Score:2)

by TigerPlish ( 174064 )

> I can't imagine the consequences if Wikipedia just disappeared or become completely biased

Already is completely biased, and has been that way for years now.

Wikipedia has a lot of bot generated content (Score:3)

by xack ( 5304745 )

Ever since the original RAMbot generated thousands of articles from US census data. Then lots of repetitive stuff like plant species are all derived from databases. They are even automating it further with [1]Wikifunctions [wikifunctions.org], which is currently in testing.

[1] https://www.wikifunctions.org/wiki/Wikifunctions:Main_Page

Re: Like AI hoaxes are the problem. (Score:2, Informative)

by drinkypoo ( 153816 )

You don't know what leftists are.

Hint: they aren't fascists by definition. If they are fascists then they aren't leftists, and vice versa.

Re: (Score:2)

by Required Snark ( 1702878 )

Exactly right. (pun intended)

The construct leftist-authoritarian-corporate-state is inherently meaningless. Socialists are somewhat OK with corporations, but outside of conservative ravings socialism is not the same thing as far left or communist governments. For example Porsche-Volkswagen has a large percent of ownership by the German State of Bavaria. The Norwegian government has a hugely successful sovereign wealth fund which invests in the private sector.

Although fascism and communism are both inheren

What's 'unnatural' writing thse days (Score:2)

by nospam007 ( 722110 ) *

No typos, no spelling errors, grammatically correct, multi-syllable words, very suspicious.:-)

News: 0175233859

The Editors Protecting Wikipedia from AI Hoaxes (404media.co)

Plausibly (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Not AI (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

This again? (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: Not AI (Score:3)

Re: (Score:2)

Re: (Score:2)

They will lose this fight (Score:1)

Re: (Score:2)

Re:They will lose this fight (Score:4, Insightful)

Re: (Score:2)

Re: They will lose this fight (Score:2)

buy wikipedia? (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Wikipedia has a lot of bot generated content (Score:3)

Re: Like AI hoaxes are the problem. (Score:2, Informative)

Re: (Score:2)

What's 'unnatural' writing thse days (Score:2)