News: 0175318757

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Google Offers Its AI Watermarking Tech As Free Open Source Toolkit (arstechnica.com)

(Thursday October 24, 2024 @05:40PM (BeauHD) from the open-source-FTW dept.)


An anonymous reader quotes a report from Ars Technica:

> Back in May, Google [1]augmented its Gemini AI model with SynthID , a toolkit that embeds AI-generated content with watermarks it says are "imperceptible to humans" but can be easily and reliably detected via an algorithm. Today, Google [2]took that SynthID system open source , offering the same basic watermarking toolkit for [3]free to developers and businesses . The [4]move gives the entire AI industry an easy, seemingly robust way to silently mark content as artificially generated, which could be useful for detecting deepfakes and other damaging AI content before it goes out in the wild. But there are still some important limitations that may prevent AI watermarking from becoming a de facto standard across the AI industry any time soon.

>

> Google uses a version of SynthID to watermark audio, video, and images generated by its multimodal AI systems, with differing techniques that are [5]explained briefly in this video . But in a new paper [6]published in Nature , Google researchers go into detail on how the SynthID process embeds an unseen watermark in the text-based output of its Gemini model. The core of the text watermarking process is a sampling algorithm inserted into an LLM's usual token-generation loop (the loop picks the next word in a sequence based on the model's complex set of weighted links to the words that came before it). Using a random seed generated from a key provided by Google, that sampling algorithm increases the correlational likelihood that certain tokens will be chosen in the generative process. A scoring function can then measure that average correlation across any text to determine the likelihood that the text was generated by the watermarked LLM (a threshold value can be used to give a binary yes/no answer).



[1] https://deepmind.google/discover/blog/watermarking-ai-generated-text-and-video-with-synthid/

[2] https://arstechnica.com/ai/2024/10/google-offers-its-ai-watermarking-tech-as-free-open-source-toolkit/

[3] https://x.com/GoogleDeepMind/status/1849110263871529114

[4] https://ai.google.dev/responsible/docs/safeguards/synthid

[5] https://www.youtube.com/supported_browsers?next_url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D9btDaOcfIMY

[6] https://www.nature.com/articles/s41586-024-08025-4?error=cookies_not_supported&code=2a7670ef-5e32-4e42-a266-d9db9bf2c83d



Re: (Score:2)

by dbialac ( 320955 )

Anything from Google is a new way to track. Free never means free with Google.

Won't really help (Score:2)

by alvinrod ( 889928 )

This prevents bad actors from using Google's tools to generate their fake content, but it's not going to stop any actors at the nation state level who will have their own. The absence of the watermark on their fakes will lend credibility to their authenticity because if they were fake they would obviously have the watermark that all the good companies are using. Everyone here knows people who think that way.

The only solution to this problem is to train people to be incredibly skeptical of anything that i

Three hands per human, six to eight fingers each? (Score:2)

by Moskit ( 32486 )

These image "AI watermarks" are already here and many people do not even notice them, taking AI images as real ;)

It seems that the text-based ones are more subtle indeed, however with a known algorithm it may be possible for a human to incorporate them in their written text for false positives.

The paper in Nature covers to some extent spoofing and other limitations of this watermarking approach (stealing, scrubbing, paraphrasing).

News story from next year (Score:2)

by i kan reed ( 749298 )

Google shuts its popular "AI watermarking tool" with no explanation.

It is a trap! (Score:2)

by gweihir ( 88907 )

If I understand this right, this is not actually easy to verify for anybody. It seems verification requires the model that generated the output. That way, Google gets data on who tries to verify a watermark. That does not sound good.

FORTUNE PROVIDES QUESTIONS FOR THE GREAT ANSWERS: #13
A: Doc, Happy, Bashful, Dopey, Sneezy, Sleepy, & Grumpy
Q: Who were the Democratic presidential candidates?