News: 1729582210

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Major publishers sue Perplexity AI for scraping without paying

(2024/10/22)


Major US news publishers Dow Jones & Co and NYP Holdings have sued AI search engine startup Perplexity for scraping their content without paying for it.

The lawsuit, filed on behalf of The Wall Street Journal and its sister tabloid New York Post by their parent company News Corporation, alleges two counts of copyright infringement and one of false designation of origin and dilution of trademarks. The plaintiffs accuse the AI biz of stealing the hard work of journalists to feed the data requirements of its training models. News Corp's CEO Robert Thomson claimed this could be the first of many such lawsuits against AI developers.

"The perplexing Perplexity has willfully copied copious amounts of copyrighted material without compensation, and shamelessly presents repurposed material as a direct substitute for the original source. Perplexity proudly states that users can 'skip the links' – apparently, Perplexity wants to skip the check," he told The Register in a statement.

[1]

"We applaud principled companies like OpenAI, which understands that integrity and creativity are essential if we are to realize the potential of Artificial Intelligence. Perplexity is not the only AI company abusing intellectual property and it is not the only AI company that we will pursue with vigor and rigor. We have made clear that we would rather woo than sue – but, for the sake of our journalists, our writers and our company, we must challenge the content kleptocracy."

[2]

[3]

News Corp isn't against sharing its intellectual property to train AI systems – but it wants the money upfront. In May it [4]inked a deal with the aforementioned OpenAI for just this purpose, with a [5]reported price tag over $250 million. The machine learning juggernaut also has similar deals in place with [6]Reddit and [7]Stack Overflow .

[8]Everyone wants better web search – is Perplexity's AI the answer?

[9]Cloudflare debuts one-click nuke of web-scraping AI

[10]LinkedIn started harvesting people's posts for training AI without asking for opt-in

[11]Meta back at it, harvesting Britons' public Facebook, Insta feeds for AI training

According to [12]court documents [PDF] filed in the Southern District of New York District Court, News Corp first contacted Perplexity about the matter in July but received no response. It wants $150,000 for every proven infringement – which, if enforced, could severely impact or even bankrupt the startup.

Fresh scraping claims hit Musk

While we're on the subject of intellectual property use, Tesla, Elon Musk, and Warner Bros Discovery are facing a lawsuit from Alcon Entertainment – the production house behind such hits as Blade Runner 2049 ( [13]our review here).

The filing alleges that Musk got in contact wanting to use imagery from the film for his "We, Robot" event held last week, in which the billionaire unveiled a promised self-driving taxi and bus. Alcon refused the request because of Musk's "highly politicized, capricious and arbitrary behavior, which sometimes veers into hate speech," the New York Times [14]reported .

But Alcon claims Tesla went ahead anyway, using AI-generated images derived from the film – including someone looking suspiciously like the film's star Ryan Gosling, in "a bad-faith and intentionally malicious gambit."

The news giant also isn't just peeved at the data scraping itself, but also that Perplexity doesn't cite its sources. It claimed that Perplexity's AI "answer engine" can "skip the links" and that this deprives publishers of direct revenue. Even worse, it gets things wrong.

"In addition to using Plaintiffs' copyrighted work to develop a substitute product that reproduces or imitates Plaintiffs' original content, Perplexity also harms Plaintiffs' brands by falsely attributing to Plaintiffs certain content that Plaintiffs never wrote or published," the lawsuit claims.

"Not infrequently, if Perplexity is asked about what Plaintiffs’ publications reported, Perplexity 'answers' with false information. AI developers euphemistically call these factually incorrect outputs 'hallucinations.' Perplexity’s hallucinations can falsely attribute facts and analysis to content producers like Plaintiffs, sometimes citing an incorrect source, and other times simply inventing and attributing to Plaintiffs fabricated news stories."

[15]

One case cited is an August 2024 New York Post article about European attempts to "silence great Americans like Elon Musk." It claims Perplexity, when asked for a summary, copied the first 139 words of the piece, and then added five more paragraphs of factually incorrect information.

On the data scraping side, there is a mechanism for website operators to opt out of adding their content to the voracious maw of AI training databases: the robots.txt file, implemented by [16]Google , [17]OpenAI , and [18]Cloudflare . While Perplexity CEO Aravind Srinivas has claimed his business does respect the do-not-scrape command, some third parties it uses might not be so ethical.

Perplexity had no comment at the time of going to press. ®

Get our [19]Tech Resources



[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Zxd3x_9jyF4FcyWCI7W1kQAAAE0&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Zxd3x_9jyF4FcyWCI7W1kQAAAE0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Zxd3x_9jyF4FcyWCI7W1kQAAAE0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[4] https://www.theregister.com/2024/05/23/openai_news_corp/

[5] https://www.wsj.com/business/media/openai-news-corp-strike-deal-23f186ba?st=xt1l7t5paifj1tl

[6] https://openai.com/index/openai-and-reddit-partnership/

[7] https://openai.com/index/api-partnership-with-stack-overflow/

[8] https://www.theregister.com/2024/01/05/perprexity_ai_search_engine/

[9] https://www.theregister.com/2024/07/03/cloudflare_ai_blocks/

[10] https://www.theregister.com/2024/09/19/linkedin_ai_data_access/

[11] https://www.theregister.com/2024/09/14/uk_meta_ai_facebook/

[12] https://regmedia.co.uk/2024/10/21/perplexity.pdf

[13] https://www.theregister.com/2017/10/06/review_blade_runner_2049/

[14] https://www.nytimes.com/2024/10/21/business/media/elon-musk-alcon-entertainment-robotaxi-lawsuit.html

[15] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Zxd3x_9jyF4FcyWCI7W1kQAAAE0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[16] https://blog.google/technology/ai/an-update-on-web-publisher-controls/

[17] https://www.theregister.com/2023/08/08/openai_scraping_software/

[18] https://www.theregister.com/2024/07/03/cloudflare_ai_blocks/

[19] https://whitepapers.theregister.com/



AI sounding the death knell for copyright?

Long John Silver

The tenor of my comment is directed neither toward discussing the validity of the concept of copyright, nor to its expression in law. Those topics are irrelevant to whether copyright is sustainable in practice.

Globally, so-called 'intellectual property' (IP) is awarded its status through international conventions. In particular, details of law protecting copyright vary among nations, but in essence support the same general aims. Nations particularly dependent upon copyright generated income actively promote convergence of national laws to a common standard, and through trade agreements demand evidence that copyright is enforced rigorously; in this context the USA is the major player.

Before supposed 'AI' stormed onto the scene, copyright holders were engaged in vigorous defence against widespread circumvention of their 'rights'. Commercial misappropriation of 'works' is tackled through law enforcement. However, unlawful 'competition' is facilitated by the substantial number of people willing to opt for the cheaper offerings. Likely of much greater importance is illicit distribution within the ethos of 'sharing'. The matter is bedevilled by difficulties for rights holders' in establishing plausible monetary losses. However, application in the USA of statutory damages staves off demands for universally agreed accounting of losses.

Copyright enforcement suffered a serious, perhaps ultimately fatal, blow when the 'digital era' arose. Disobedience to copyright law is rampant. It seems impossible to stem either through appeal to peoples' supposed 'better nature' or by enforcement mechanisms based on technology and civil/criminal law. Current efforts at enforcement have the look of fierce rearguard action.

'AI' has thrown a further spanner in the works. Setting aside much hyped claims for AI's capabilities, there remains the fact that 'large language models' enabling anyone so-minded to interact with AIs are acting as repositories of information. These AIs are not merely an analogue of books sitting on shelves. They offer services akin to those from a skilled librarian who also is a particular subject specialist. The oft reported nonsense output sometimes emanating from these computational resources can, in part, be attributed to imperfections of the underlying technology and to indiscriminate use of 'training' materials.

Take-up of these resources by ordinary people is occurring remarkably fast. This type of AI seems set to become a major contributor to education, to aspects of the work of various professionals offering services, and in the context of academic research. State legislatures, the national Houses of Congress, and courts in the USA may be able to hobble AIs or to turn them into cash cows for holders of rights. It's likely 'Western' nations, and others highly dependent upon trade with the USA, will step into line.

Looking globally, the future for copyright enforcement in AI is far less rosy. Nobody other than their own countrymen have a hope of preventing a free-for-all feeding of mankind's accumulated knowledge and broader culture into AIs. This will occur in universities, some other public institutions, and in commerce, with little chance of it being stemmed. Moreover, some of this will be shared globally on the Internet.

Copyright is a legal construct. One which many believe enforces a natural right to property ownership. Others think differently, else they are unwilling to engage in metaphysical argument. Regardless of that, if one nation goes against the grain, the copyright construct collapses globally. I posit that an initiative supporting open dissemination of information shall arise from BRICS nations when they flex economic muscle.

The foregoing leaves the interesting question of what IP-dependent economies could do in preparation for the inevitable, so that a seemingly untenable mode of business is replaced by other means fostering cultural and material prosperity. Perhaps, differing perspectives on the meaning of 'property' will become casus belli.

What happens when you cut back the jungle? It recedes.