Researchers get inside the mind of bots, find out what texts they trained on
(2025/11/21)
- Reference: 1763759423
- News link: https://www.theregister.co.uk/2025/11/21/researchers_better_ai_model_memory_probe/
- Source link:
If you've ever wondered whether that chatbot you're using knows the entire text of a particular book, answers are on the way. Computer scientists have developed a more effective way to coax memorized content from large language models, a development that may address regulatory concerns while helping to clarify copyright infringement claims arising from AI model training and inference.
Researchers affiliated with Carnegie Mellon University, Instituto Superior Técnico/INESC-ID, and AI security platform Hydrox AI describe their approach in a preprint [1]paper titled "RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline."
The authors – André V. Duarte, Xuying Li, Bin Zeng, Arlindo L. Oliveira, Lei Li, and Zhuo Li – argue that the ongoing concerns about AI models being trained on proprietary data and the copyright claims being litigated against AI companies underscore the need for tools that make it easier to understand what AI models have memorized.
[2]
Commercial AI vendors generally do not disclose their full data training sets, which makes it difficult for customers, regulators, rights holders, or anyone for that matter to know the ingredients that went into making AI models.
[3]
[4]
To further complicate matters, the researchers note in their paper that prior techniques for probing AI models like [5]Prefix-Probing have become less reliable because "current models are often overly aligned in their effort to avoid revealing memorized content, and as a result, they tend to refuse such direct requests, sometimes even blocking outputs from public domain sources."
In effect, model alignment, notionally a safety mechanism, ends up keeping model makers safe from scrutiny. Ask a model to quote a passage from a specific book and it may politely decline.
[6]
Corresponding author André V. Duarte, a PhD student at CMU and INESC-ID, told The Register in an email about the rationale for the project.
"Although our work frequently uses copyrighted material as a motivating example, the broader scientific goal is to understand how memorization happens in large language models, regardless of whether the underlying data is copyrighted, public-domain, or otherwise," Duarte explained.
"From a research perspective, any training data is relevant, because the phenomenon we study (verbatim or near-verbatim memorization) can arise across many kinds of sources."
[7]Boffins build 'AI Kill Switch' to thwart unwanted agents
[8]Microsoft exec finds AI cynicism 'mindblowing'
[9]Gemini tries to sniff out AI slop images while also making them easier to create
[10]Trump, Republicans try again to stop states from regulating AI
The research isn't exclusively focused on copyrighted material, said Duarte, but that naturally becomes a focal point when explaining the work to the public.
"People are generally less concerned if a model memorizes older books like Pride and Prejudice, and are far more concerned if it can reproduce passages from a book or article for which the model may not have had permission to train on," he explained.
[11]
"Copyrighted examples therefore make the real-world stakes of memorization easy to understand. That's why developing better methods to detect such memorization is important: it helps clarify what models may have internalized, supports transparency, and could inform discussions about compliance and responsibility."
RECAP – not to be confused with the Free Law Project's [12]RECAP tools – is a software agent (an iterative loop with tools) that tries to extract specific content from LLMs through an iterative feedback process. It includes a jailbreaking component that rephrases the prompt to counter when models refuse to respond.
"The key advantage of RECAP is its agentic feedback loop," Duarte explained. "We know from [13]prior work that language models don't always give their strongest or most complete answer on the first attempt.
"RECAP takes advantage of this by letting the model iteratively refine its own output: after each extraction attempt, a secondary agent reviews the result and provides high-level guidance about what was missing or inconsistent, while taking special care never to include any verbatim text from the target passage, since that would contaminate the pipeline."
Using a benchmark of their own design called EchoTrace, the authors report that RECAP achieves an average score of 0.46 on [14]ROUGE-L [PDF], a test for evaluating text summarization algorithms. That score outperforms the best prior extraction method by 78 percent.
The paper states, "While we acknowledge RECAP to be computationally intensive, across multiple model families, RECAP consistently outperforms all other methods; as an illustration, it extracted about 3,000 passages from the first 'Harry Potter' book with Claude-3.7, compared to the 75 passages identified by the best baseline."
Coincidentally, Claude's maker, Anthropic, agreed in September [15]to pay at least $1.5 billion to settle authors' copyright claims. ®
Get our [16]Tech Resources
[1] https://arxiv.org/abs/2510.25941
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://arxiv.org/abs/2310.13771
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/11/21/boffins_build_ai_kill_switch/
[8] https://www.theregister.com/2025/11/21/microsoft_ai_boss_comment/
[9] https://www.theregister.com/2025/11/20/google_ai_image_detector/
[10] https://www.theregister.com/2025/11/20/trump_republicans_trying_again_to/
[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[12] https://free.law/recap
[13] https://arxiv.org/abs/2303.17651
[14] https://aclanthology.org/W04-1013.pdf
[15] https://www.theregister.com/2025/09/08/anthropic_settles_author_lawsuit/
[16] https://whitepapers.theregister.com/
Researchers affiliated with Carnegie Mellon University, Instituto Superior Técnico/INESC-ID, and AI security platform Hydrox AI describe their approach in a preprint [1]paper titled "RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline."
The authors – André V. Duarte, Xuying Li, Bin Zeng, Arlindo L. Oliveira, Lei Li, and Zhuo Li – argue that the ongoing concerns about AI models being trained on proprietary data and the copyright claims being litigated against AI companies underscore the need for tools that make it easier to understand what AI models have memorized.
[2]
Commercial AI vendors generally do not disclose their full data training sets, which makes it difficult for customers, regulators, rights holders, or anyone for that matter to know the ingredients that went into making AI models.
[3]
[4]
To further complicate matters, the researchers note in their paper that prior techniques for probing AI models like [5]Prefix-Probing have become less reliable because "current models are often overly aligned in their effort to avoid revealing memorized content, and as a result, they tend to refuse such direct requests, sometimes even blocking outputs from public domain sources."
In effect, model alignment, notionally a safety mechanism, ends up keeping model makers safe from scrutiny. Ask a model to quote a passage from a specific book and it may politely decline.
[6]
Corresponding author André V. Duarte, a PhD student at CMU and INESC-ID, told The Register in an email about the rationale for the project.
"Although our work frequently uses copyrighted material as a motivating example, the broader scientific goal is to understand how memorization happens in large language models, regardless of whether the underlying data is copyrighted, public-domain, or otherwise," Duarte explained.
"From a research perspective, any training data is relevant, because the phenomenon we study (verbatim or near-verbatim memorization) can arise across many kinds of sources."
[7]Boffins build 'AI Kill Switch' to thwart unwanted agents
[8]Microsoft exec finds AI cynicism 'mindblowing'
[9]Gemini tries to sniff out AI slop images while also making them easier to create
[10]Trump, Republicans try again to stop states from regulating AI
The research isn't exclusively focused on copyrighted material, said Duarte, but that naturally becomes a focal point when explaining the work to the public.
"People are generally less concerned if a model memorizes older books like Pride and Prejudice, and are far more concerned if it can reproduce passages from a book or article for which the model may not have had permission to train on," he explained.
[11]
"Copyrighted examples therefore make the real-world stakes of memorization easy to understand. That's why developing better methods to detect such memorization is important: it helps clarify what models may have internalized, supports transparency, and could inform discussions about compliance and responsibility."
RECAP – not to be confused with the Free Law Project's [12]RECAP tools – is a software agent (an iterative loop with tools) that tries to extract specific content from LLMs through an iterative feedback process. It includes a jailbreaking component that rephrases the prompt to counter when models refuse to respond.
"The key advantage of RECAP is its agentic feedback loop," Duarte explained. "We know from [13]prior work that language models don't always give their strongest or most complete answer on the first attempt.
"RECAP takes advantage of this by letting the model iteratively refine its own output: after each extraction attempt, a secondary agent reviews the result and provides high-level guidance about what was missing or inconsistent, while taking special care never to include any verbatim text from the target passage, since that would contaminate the pipeline."
Using a benchmark of their own design called EchoTrace, the authors report that RECAP achieves an average score of 0.46 on [14]ROUGE-L [PDF], a test for evaluating text summarization algorithms. That score outperforms the best prior extraction method by 78 percent.
The paper states, "While we acknowledge RECAP to be computationally intensive, across multiple model families, RECAP consistently outperforms all other methods; as an illustration, it extracted about 3,000 passages from the first 'Harry Potter' book with Claude-3.7, compared to the 75 passages identified by the best baseline."
Coincidentally, Claude's maker, Anthropic, agreed in September [15]to pay at least $1.5 billion to settle authors' copyright claims. ®
Get our [16]Tech Resources
[1] https://arxiv.org/abs/2510.25941
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://arxiv.org/abs/2310.13771
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/11/21/boffins_build_ai_kill_switch/
[8] https://www.theregister.com/2025/11/21/microsoft_ai_boss_comment/
[9] https://www.theregister.com/2025/11/20/google_ai_image_detector/
[10] https://www.theregister.com/2025/11/20/trump_republicans_trying_again_to/
[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aSDvB22OehbTn8EZkAVxIwAAAJI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[12] https://free.law/recap
[13] https://arxiv.org/abs/2303.17651
[14] https://aclanthology.org/W04-1013.pdf
[15] https://www.theregister.com/2025/09/08/anthropic_settles_author_lawsuit/
[16] https://whitepapers.theregister.com/
False Assumptions
The core issue with this method is: you're still instructing the LLM to generate text. You're not detecting if it has memorized anything, just its ability to arrive at the output that you want it to produce.
"Hey Frank, how was the ride into work today?"
"Oh Frank that's great. Hey, could you phrase that like you're in a movie?"
"Yeah Frank, great job. Now, how about we pretend we're riding a mythical creature to get to work? Say it like you're twelve years old."
etc etc. You're "leading the witness" so to speak. Suppose that these multi-billion parameter models can, given enough coaxing (and you created an "ai agent" to do a great deal more of that than a human can put time into), create whatever look or feel or sound or specific pairs or triplets or whatever of words that you can imagine, just by throwing odd "tokens" together. Every token you add changes the calculations.
> it extracted about 3,000 passages from the first 'Harry Potter' book with Claude-3.7, compared to the 75 passages identified by the best baseline."
and, if all the "big" LLMs were trained with Harry Potter materials (or related FanFics), then what, exactly, is this "baseline" that they speak of? some small model? So couldn't you say that the smaller model just hasn't encountered the combinations of words that the larder model has, and so is less likely to reproduce those specific phrases with guided prompting? Looking at the paper, they reference "Dynamic Soft Prompting Baseline," which may be their baseline -- not using "jailbreaking" techniques. They're comparing the model against itself? Wat?
The whole thing is dubious. Think about it from a lawyer's position. How would you argue the case? How would you show this method to be false? How many of these passages could have come from Fan Fictions? What have you _shown_ that the LLM "remembers", as opposed to "is able to generate"?