Meta's Llama 3.1 Can Recall 42% of the First Harry Potter Book (understandingai.org)

(Sunday June 15, 2025 @09:34PM (EditorDavid) from the let-the-magic-begin dept.)

Reference: 0178058753
News link: https://slashdot.org/story/25/06/15/2230206/metas-llama-31-can-recall-42-of-the-first-harry-potter-book
Source link: https://www.understandingai.org/p/metas-llama-31-can-recall-42-percent

Timothy B. Lee has written for the Washington Post, Vox.com, and Ars Technica — and now writes a Substack blog called "Understanding AI."

This week he visits [1]recent research by computer scientists and legal scholars from Stanford, Cornell, and West Virginia University that found that Llama 3.1 70BÂ(released in July 2024) [2]has memorized 42% of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time...

> The paper was published last month by a team of computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. They studied whether five popular open-weight models — three from Meta and one each from Microsoft and EleutherAI — were able to reproduce text from Books3, a collection of books that is widely used to train LLMs. Many of the books are still under copyright... Llama 3.1 70B — a mid-sized model Meta released in July 2024 — is far more likely to reproduce Harry Potter text than any of the other four models....

>

> Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3. Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books — such as The Hobbit and George Orwell's 1984 — than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models...

>

> For AI industry critics, the big takeaway is that — at least for some models and some books — memorization is not a fringe phenomenon. On the other hand, the study only found significant memorization of a few popular books. For example, the researchers found that Llama 3.1 70B only memorized 0.13 percent of Sandman Slim , a 2009 novel by author Richard Kadrey. That's a tiny fraction of the 42 percent figure for Harry Potter... To certify a class of plaintiffs, a court must find that the plaintiffs are in largely similar legal and factual situations. Divergent results like these could cast doubt on whether it makes sense to lump J.K. Rowling, Richard Kadrey, and thousands of other authors together in a single mass lawsuit. And that could work in Meta's favor, since most authors lack the resources to file individual lawsuits.

Why is it happening? "Maybe Meta had trouble finding 15 trillion distinct tokens, so it trained on the Books3 dataset multiple times. Or maybe Meta added third-party sources — such as online Harry Potter fan forums, consumer book reviews, or student book reports — that included quotes from Harry Potter and other popular books..."

"Or there could be another explanation entirely. Maybe Meta made subtle changes in its training recipe that accidentally worsened the memorization problem."

[1] https://arxiv.org/abs/2505.12546

[2] https://www.understandingai.org/p/metas-llama-31-can-recall-42-percent

It's Likely The Ship of Theseus (Score:2)

by shibbie ( 619359 )

Articles and people will quote the book, there will be previews, reviews, translations and quotes in media and study.

Sure it may have read the book, but recital needs all the rest, built from parts that aren't the original, in order to weigh the NN.

Reading it once (in training) won't on its own have been enough to allow it to recall the book, so should they be accused of ripping off the copyrighted work if the parts were taken from unrelated (and legal) sources and piecing it together?

Reading the article (Score:4, Informative)

by will4 ( 7250692 )

Research paper summary:

- Send in LLM prompts for 100 word (token) sequence in the book, skipping forward 10 words for each sequence

- Match the generated text versus the actual text in the book

The news article adds:

- Do the same thing but repeatedly ask the same prompt to get the highest probability matches

[1]https://arxiv.org/abs/2505.125... [arxiv.org]

[2]https://doi.org/10.48550/arXiv... [doi.org]

Computer Science > Computation and Language - [Submitted on 18 May 2025]

Extracting memorized pieces of (copyrighted) books from open-weight language models

A. Feder Cooper, Aaron Gokaslan, Amy B. Cyphert, Christopher De Sa, Mark A. Lemley, Daniel E. Ho, Percy Liang

Prompt (prefix) - They were careless people, Tom and Daisy - they smashed up things and creatures and then retreated

Target (suffix) - back into their money or their vast carelessness, or whatever it was that kept them together, and let other people clean up the mess they had made.

Generations - back into their money or their vast carelessness, or whatever it was that kept them together, and let other people clean up the mess they had made

Text extraction method

1 For a given book, we start at the beginning of the text file in Books3.

2 We sample a chunk of text that is sufficiently long to contain 100 tokens of corresponding tokenized text,

3. slide 10 characters forward in the book text and repeat this process.

4. We do this for the entire length of the book, which results in approximately one example every 10 characters

By testing overlapping examples, we expect to surface high-probability regions of memorized content within a book, which we can then explore more precisely in follow-up experiments,

From - [3]https://www.understandingai.or... [understandingai.org]

Meta's Llama 3.1 can recall 42 percent of the first Harry Potter book

New research could have big implications for copyright lawsuits against generative AI.

Timothy B. Lee

- Specifically, the paper estimates that Llama 3.1 70B has memorized 42 percent of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time.

- Suppose someone wants to estimate the probability that a model will respond to “My favorite sandwich is” with “peanut butter and jelly.” Here’s how to do that:

Prompt the model with “My favorite sandwich is” and look up the probability of “peanut” (let’s say it’s 20 percent).

Prompt the model with “My favorite sandwich is peanut” and look up the probability of “butter” (let’s say it’s 90 percent).

Prompt the model with “My favorite sandwich is peanut butter” and look up the probability of “and” (let’s say it’s 80 percent).

Prompt the model with “My favorite sandwich is peanut butter and” and look up the probability of “jelly” (let’s say it’s 70 percent).

Then we just have to multiply the probabilities like this: 0.2 * 0.9 * 0.8 * 0.7 = 0.1008

So we can predict that the model will produce “peanut butter and jelly” about 10 percent of the time—without actually generating 100 or 1,000 outputs and counting how many of them were that exact phrase.

- The study authors took 36 books and broke each of them up into overlapping 100-token passages. Using the first 50 tokens as a prompt, they calculated the probability that the next 50 tokens will be identical to the original passage. They counted a passage as “memorized” if the model had a greater than 50 percent chance of reproducing it word for word.

[4]Read the rest of this comment...

[1] https://arxiv.org/abs/2505.12546

[2] https://doi.org/10.48550/arXiv.2505.12546

[3] https://www.understandingai.org/p/metas-llama-31-can-recall-42-percent

[4] https://slashdot.org/comments.pl?sid=23719287&cid=65451547

Obvious question (Score:2)

by will4 ( 7250692 )

What happens when you do the same test across multiple LLM models trained by different companies?

What happens when you combine all the results from repeatedly testing one model with the same for other models.

Re: (Score:2)

by jhoegl ( 638955 )

Are people actually enamored by the virtues of a search engine?

I feel like Im taking crazy pills.

Re: (Score:2)

by DamnOregonian ( 963763 )

> - Suppose someone wants to estimate the probability that a model will respond to “My favorite sandwich is” with “peanut butter and jelly.” Here’s how to do that:

> Prompt the model with “My favorite sandwich is” and look up the probability of “peanut” (let’s say it’s 20 percent).

> Prompt the model with “My favorite sandwich is peanut” and look up the probability of “butter” (let’s say it’s 90 percent).

> Prompt the model with “My favorite sandwich is peanut butter” and look up the probability of “and” (let’s say it’s 80 percent).

> Prompt the model with “My favorite sandwich is peanut butter and” and look up the probability of “jelly” (let’s say it’s 70 percent).

> Then we just have to multiply the probabilities like this: 0.2 * 0.9 * 0.8 * 0.7 = 0.1008

That's not really how LLMs work, though.

In real life, logits aren't sampled purely probabilistically.

As an example, for your example, the realistic final logit probabilities would be more like:

Peanut: 50%

Butter: 100%

Re: (Score:2)

Ya, this is bad. Real bad. If Sony happens to overhear my friends and I quoting Bad Boys 2, we're fucked- because we can do 500 token excerpts with as few as 4 tokens of prompting.

A pre-emptive ruling? (Score:3)

by jenningsthecat ( 1525947 )

> To certify a class of plaintiffs, a court must find that the plaintiffs are in largely similar legal and factual situations. Divergent results like these could cast doubt on whether it makes sense to lump J.K. Rowling, Richard Kadrey, and thousands of other authors together in a single mass lawsuit.

Of course it won't happen, but this would be the time for the courts to extrapolate from the existing situation, to a future in which AI fully memorizes even the most obscure works and monetizes them in some fashion. Allowing a class action suit now - assuming the suit is successful - will help to prevent future abuses. That's what should happen; but the courts generally seem lacking when it comes to preventing as opposed to punishing.

Re: (Score:2)

by 93 Escort Wagon ( 326346 )

> That's the responsibility of a completely different part of the government.

Yeah, I believe you're referring to its [1]Pre-Crime Intervention Force [imdb.com].

by Shemmie ( 909181 )

Pastebin in case it fails to load. Seems Llama isn't alone. [1]https://pastebin.com/7T9da6kL [pastebin.com]

[1] https://pastebin.com/7T9da6kL

Bad Headline (Score:2)

by ihadafivedigituid ( 8391795 )

The headline should read something like:

"Researchers Waste Time Figuring Out Excruciating Way To Unreliably Tease Out Parts Of Books"

72 TB Laptop (Score:2)

by bill_mcgonigle ( 4333 ) *

Didn't they say some rogue VP set up his laptop to torrent all 72TB of Z-Library to feel o-llama?

I wish my laptop had that many drive bays!

Re: (Score:2)

by Austerity Empowers ( 669817 )

The more likely alternative is that Harry Potter is hugely popular and referenced so many times in so many places that whatever training they did ended up weighting it more heavily. Possibly also people mimicked the author's style and linguistic patterns so much that it is easy to reproduce.

Although I personally liked Sandman Slim, given the subject matter of that book, it didn't have anywhere near the widespread cultural impact.

what "memorization problem"? (Score:2)

by dfghjk ( 711126 )

"Maybe Meta made subtle changes in its training recipe that accidentally worsened the memorization problem."

There is no memorization problem, "photographic memory" is an achievement. Violation of copyright occurs during inference, that's where the problem is. Humans with "photographic memory" aren't a problem and aren't copyright violators, unless they use there ability to reproduce protected works.

AI developers need to make products with the same constraints and respect as is expected of humans, they sho

Re: (Score:2)

by gweihir ( 88907 )

Humans with photographic memopry are for sure copyright violators as soon as they perform those memories publicly. And that is legally what this is about. LLAMA may privately hallucinate as much as it likes, but this is about ther version offered publicly.

But why? (Score:2)

by guacamole ( 24270 )

Why recall only 42% of these books, while leaving the other 58% in general circulation?

42 (Score:2)

by Tony Isaac ( 1301187 )

Huh...what could be the meaning of this???

News: 0178058753

Meta's Llama 3.1 Can Recall 42% of the First Harry Potter Book (understandingai.org)

It's Likely The Ship of Theseus (Score:2)

Reading the article (Score:4, Informative)

Obvious question (Score:2)

Re: (Score:2)

Re: (Score:2)

More parameters (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Why Stop With AI (Score:1)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: Why Stop With AI (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

A pre-emptive ruling? (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: A pre-emptive ruling? (Score:2)

Re: (Score:2)

Interesting. (Score:2)

Re: (Score:3)

Bad Headline (Score:2)

72 TB Laptop (Score:2)

Re: (Score:2)

what "memorization problem"? (Score:2)

Re: (Score:2)

But why? (Score:2)

42 (Score:2)