The Internet Archive Now Captures AI-Generated Content (Including Google's AI Overviews) (cnn.com)
(Sunday November 16, 2025 @10:40PM (EditorDavid)
from the 150-terabytes-a-day dept.)
- Reference: 0180097487
- News link: https://tech.slashdot.org/story/25/11/16/1852218/the-internet-archive-now-captures-ai-generated-content-including-googles-ai-overviews
- Source link: https://www.cnn.com/2025/11/16/tech/internet-archive-wayback-machine
CNN [1]profiled the non-profit Internet Archive today — and included this tidbit about how they archive parts of the internet that are now "tucked in conversations with AI chatbots."
> The rise of artificial intelligence and AI chatbots means the Internet Archive is changing how it records the history of the internet. In addition to web pages, the Internet Archive now captures AI-generated content, like ChatGPT answers and those summaries that appear at the top of Google search results. The Internet Archive team, which is made up of librarians and software engineers, are experimenting with ways to preserve how people get their news from chatbots by coming up with hundreds of questions and prompts each day based on the news, and recording both the queries and outputs, [says Wayback Machine Director Mark Graham].
It sounds like a fun place to work...
> Archivists use bespoke machines to digitize books page by page, [2]livestreaming their work on YouTube for all to see (alongside some lo-fi music). Record players churn out vintage tunes from 1920s and 1940s, and the building houses every type of media console for any type of content imaginable, from microfilm, to CDs and satellite television. (The Internet Archive preserves music, television, books and video games, too)... "There are a lot of people that are just passionate about the cause. There's a cyberpunk atmosphere," Annie Rauwerda, a Wikipedia editor and social media influencer, said at a party thrown at the Internet Archive's headquarters to celebrate reaching a trillion pages "The internet (feels) quite corporate when I use it a lot these days, but you wouldn't know from the people here."
[1] https://www.cnn.com/2025/11/16/tech/internet-archive-wayback-machine
[2] https://www.youtube.com/watch?v=SxUjwZYBIUs
> The rise of artificial intelligence and AI chatbots means the Internet Archive is changing how it records the history of the internet. In addition to web pages, the Internet Archive now captures AI-generated content, like ChatGPT answers and those summaries that appear at the top of Google search results. The Internet Archive team, which is made up of librarians and software engineers, are experimenting with ways to preserve how people get their news from chatbots by coming up with hundreds of questions and prompts each day based on the news, and recording both the queries and outputs, [says Wayback Machine Director Mark Graham].
It sounds like a fun place to work...
> Archivists use bespoke machines to digitize books page by page, [2]livestreaming their work on YouTube for all to see (alongside some lo-fi music). Record players churn out vintage tunes from 1920s and 1940s, and the building houses every type of media console for any type of content imaginable, from microfilm, to CDs and satellite television. (The Internet Archive preserves music, television, books and video games, too)... "There are a lot of people that are just passionate about the cause. There's a cyberpunk atmosphere," Annie Rauwerda, a Wikipedia editor and social media influencer, said at a party thrown at the Internet Archive's headquarters to celebrate reaching a trillion pages "The internet (feels) quite corporate when I use it a lot these days, but you wouldn't know from the people here."
[1] https://www.cnn.com/2025/11/16/tech/internet-archive-wayback-machine
[2] https://www.youtube.com/watch?v=SxUjwZYBIUs
ultimate compression (Score:2)
by aRTeeNLCH ( 6256058 )
I know they won't have all required data, but if, they could just archive the prompts and the exact version of the AI answering, including their training data and any salt and such, so the answers could all be regenerated.
No thanks (Score:4, Interesting)
Internet Archive should provide us a record of actual peoples' work and thoughts, not some contrived facsimile of that work that also must meet the approval of those who openly admit they want to rewrite history.