Log files that describe the history of the internet are disappearing. A new project hopes to save them
- Reference: 1771059671
- News link: https://www.theregister.co.uk/2026/02/14/internet_history_initiative/
- Source link:
But in 2024, the last person working on the project retired and SLAC shut it down without a plan to preserve or share the data it collected.
Happily, the tight network research community knew of PingER’s demise. Enter Jim Cowie, a computer scientist, academic, and entrepreneur.
[1]
In the latter capacity, Cowie founded a company called Renesys that between 2000 and 2014 gathered and sold intelligence about internet infrastructure – think of it as a precursor to Cloudflare’s Radar service.
[2]
[3]
A company called Dyn acquired Renesys in 2014, before itself being acquired by Oracle. During the back-office crunch that followed the two acquisitions, much of the data Renesys had collected over the years was lost.
“As time passes, information likes to disappear. If you do not invest, its default is to die,” Cowie told The Register .
[4]
Losing data irks Cowie, who, this week, told the Asia Pacific Regional Internet Conference on Operational Technologies (APRICOT) he thinks “the operational exhaust of the internet” can help humanity to understand what networked systems have done to society.
“If we wanted to understand if human progress and technical progress are aligned, how would we do that?” he asked during his keynote address. “These are perilous times. Everything is changing and we don’t know how.”
Cowie thinks historians can start to answer those questions by investigating data collected by projects like PingER, Renesys, and others who collect data about the internet’s operations.
[5]
So he’s started a project to preserve those files for posterity.
It’s called the Internet History Initiative (IHI) and Cowie says its mission is to identify and preserve the records that will let future generations tell the story of the internet and its impact. He thinks it will need to involve archivists, library scientists, technologists, and other specialists, operating across many institutions.
A core activity of the project will be to apply the LOCKSS principle – Lots of Copies Keep Stuff Safe – to store data offline using methods that will preserve it for a century. Cowie thinks IHI will need a distributed group of project participants who store parts of the collection, plus a federation layer that provides a view of all the datasets it stores. He also envisages another version of the collection in warm storage so researchers can access it.
[6]Legacy Update expands archive of vanished Microsoft downloads
[7]Pew: Quarter of web pages vanished in past decade
[8]Internet Archive exposed again – this time through Zendesk
[9]Trump scrubs all mention of DEI, gender, climate change from federal websites
He’s optimistic IHI will be able to create such a resource, because similar collections already exist.
“People like the RIPE NCC have been holding on to routing data and active performance data for decades and have organized it,” Cowie said.
He also pointed to the University of Oregon’s [10]RouteViews project, which has archived almost 30 years of Border Gateway Protocol (BGP) routing data and makes it available to researchers.
That project’s peering coordinator, Nina Bargisen, said RouteViews is already aware of the IHI, feels its goals are similar to its own, and sees no reason why her organization would not be willing to engage and help.
“There are probably a half a dozen or a dozen organizations like that, which have that kind of history,” Cowie said, and if they pool expertise the IHI will have plenty to work with. “We are not starting from zero, by any means.”
Indeed, IHI has already preserved some datasets from RIPE, and recovered the SLAC PingER data.
The organization has also identified the data it wants to preserve, and some data it fears may already be lost.
Cowie told The Register he is aware of an academic paper that describes a research project that analyzed every packet of data transmitted from the USA to the UK for two hours. The paper remains easy to find, he said, but the data it describes has vanished.
He hopes someone knows where that data, and other collections like it, can be found. “Think of people you know who have old traceroutes, or zone files, on a magnetic tape or in their closet," he said, and then ask them to [11]contact the project ."
He’s open to learning about new data sources, too. The Register pointed out that networking vendors have collected decades of data describing customers’ use of their products, and that Cisco has so much that it's [12]trained custom LLMs . Cowie seemed intrigued by the possibilities such data creates for IHI.
The IHI will need friends, and funding. RouteViews’ Bargisen said if anyone can pull this off, Cowie can. The man himself said he’s in discussions with various parties, but that it is too early to discuss how the project will constitute itself.
But he urged anyone interested to [13]join its mailing list and stay tuned. ®
Get our [14]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aZBVzyNsr7TxmJmbjnoIIgAAAYM&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZBVzyNsr7TxmJmbjnoIIgAAAYM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZBVzyNsr7TxmJmbjnoIIgAAAYM&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZBVzyNsr7TxmJmbjnoIIgAAAYM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZBVzyNsr7TxmJmbjnoIIgAAAYM&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://www.theregister.com/2025/12/11/legacy_update_update/
[7] https://www.theregister.com/2024/05/20/webpages_vanish_decade/
[8] https://www.theregister.com/2024/10/21/internet_archive_zendesk_access_attack/
[9] https://www.theregister.com/2025/02/03/trump_admin_scrubs_dei_websites/
[10] https://www.routeviews.org/routeviews/
[11] https://internethistoryinitiative.org/
[12] https://www.theregister.com/2025/12/17/cisco_foundation_model_indentity_intelligence/
[13] https://docs.google.com/forms/d/e/1FAIpQLScqsOd-9QstW0LXJSDuKI3nSybgTk6EUN2wul6Il_aTLcoKYQ/viewform
[14] https://whitepapers.theregister.com/
Other lost history ...
It is a pity that there is no archive of the lives of domain names. When they were live if they no longer exist, who 'hijacked them' and when and so on. Old domains that are now hosts to fraud and crap with no easy way of ascertaining when that change took place and in my own case being unable to restore a domain that is now randomly forwarding to an unrelated third party :( Old links still work, but not to the content that they refer to, and so some means of flagging that fact would be useful especially given the amount of fraudulent sites that are active today?
DejaNews
Before the fuckheads at go ogle completely cocked it up, the Deja News collection contained the greatest archive of human interaction in, on and with the Internet that will ever exist. And it was completely squandered by some idiot kids who had no idea what they had just purchased, and didn't care because they didn't invent it. Not that they ever actually invented anything themselves, you understand.
I personally tend to contribute to archive projects of this sort so hopefully the same mistake doesn't get made again.
When alpha-goo goes belly-up with the AI bubble bursting, I'm going to party ...
And yet...
We have thousands of datacentres keeping the essential information that I once looked at an advert for cheese.