News: 0175587371

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Digital Preservation Is Not Keeping Up With the Growth of Scholarly Knowledge (nature.com)

(Tuesday December 03, 2024 @05:00PM (msmash) from the leaky-bucket dept.)


Nature:

> Millions of research articles are [1]absent from major digital archives . This worrying finding, which Nature [2]reported on earlier this year , was laid bare in a study by Martin Eve, who studies technology and publishing at Birkbeck, University of London. Eve sampled more than seven million articles with unique digital object identifiers (DOIs), a string of characters used to identify and link to specific publications, such as scholarly articles and official reports. Of these, he found that more than two million were 'missing' from archives -- that is, they were not preserved in major archives that ensure literature can be found in the future.

>

> Eve, who is also a research developer at Crossref, an organization that registers DOIs, carried out the study in an effort to better understand a problem librarians and archivists already knew about -- that although researchers are generating knowledge at an unprecedented rate, it is not necessarily being stored safely for the future. One contributing factor is that not all journals or scholarly societies survive in perpetuity. For example, a 2021 study found that a lack of comprehensive and open archiving meant that 174 open-access journals, covering all major research topics and geographical regions, vanished from the web in the first two decades of this millennium.

>

> A lack of long-term archiving particularly affects institutions in low- and middle-income countries, less-affluent institutions in rich countries and smaller, under-resourced journals worldwide. Yet it's not clear whether researchers, institutions and governments have fully taken the problem on board. [...] At the heart of the problem is a lack of money, infrastructure and expertise to archive digital resources. [...] For institutions that can afford it, one solution is to pay a preservation archive to safeguard content. Examples include Portico, based in New York City, and CLOCKSS, based in Stanford, California, both of which count a raft of publishers and libraries as customers.



[1] https://www.nature.com/articles/d41586-024-03842-z

[2] https://science.slashdot.org/story/24/03/07/0614207/millions-of-research-papers-at-risk-of-disappearing-from-the-internet



Digital preservation is hard (Score:2)

by MpVpRb ( 1423381 )

Digital data decays and the formats and readers become obsolete. This is a serious technical problem, but even worse are the attitudes of rights holders. They view preservation as theft and would prefer that old data disappear if they can't get paid for it

Re: (Score:2)

by geekmux ( 1040042 )

> Digital data decays and the formats and readers become obsolete. This is a serious technical problem, but even worse are the attitudes of rights holders. They view preservation as theft and would prefer that old data disappear if they can't get paid for it

(The PDFuckin’ Don Father) ”Oh yeah? Pretty damn easy to create a standard if ya ask me.”

By the time we get done worrying about some format lasting 30 years or more, it’s gonna prove us wrong.txt.

As far as patented greedy assholes go, fuck ‘em. We didn’t need patents to learn about the Roman Empire. Their history will either be remembered through donation, or forgotten by litigation.

Re: Digital preservation is hard (Score:1)

by Reckoning ( 10502566 )

Scientific Journals inadvertently contribute to this by pay walling everything. Research should not be pay walled by default.

Re: (Score:2)

by Hoi Polloi ( 522990 )

Copyright reform is badly needed but no politician considers it a priority or has the guts to drive it. I

Effects of copyright cartels (Score:2)

by wierd_w ( 1375923 )

Might have something to do with the fact that efforts to back up, replicate, and preserve this data are met with strong enforcement efforts from the likes of Eslevier and pals, who's business model REVOLVES around this data being scarce, and only obtainable THROUGH THEM.

Exclusivity, or Preservation.

PICK *ONE*.

Re: (Score:1)

by vivian ( 156520 )

Copyright holders who have by definition been granted a unique monopoly on information should be obliged to maintain archives of that information in perpetuity, to be made available to anyone who cares to access it for a reasonable fee.

If they no longer wish to maintain that archive, or go out of business, they should be required to release that information to open public archives and relinquish copyright on that information.

If they fail to maintain the archive and lose information, or otherwise mak

also limit disney vault like locking and big colle (Score:2)

by Joe_Dragon ( 2206452 )

also limit disney vault like locking and only selling in big collections

We need companies working on optical formats... (Score:4, Interesting)

by ctilsie242 ( 4841247 )

After people started streaming/downloading, work on mainstream optical formats has ceased. Even Sony (AFAIK) doesn't have an optical archiving format anymore.

This is something businesses need. LTO-9 is okay, but expensive, and optical media done right is relatively cheap to make, can hold just as much, if not [1]more than a tape. [tomsguide.com]

Long term archiving formats are the cornerstone of any real preservation efforts. Yes, one can always run through tapes and copy the data every few years or stuff everything on a NAS and back that up, but when you start getting into exabytes worth of data, you need to have a solid format, as you don't have the bandwidth to keep rereading the data.

I just wish optical could get some updates. Even a 5 TB disk would make life a lot easier and make home backups a thing again. People use portable drives for this, but hard drives are not archival media.

From there, it would be nice to have an open source DAM or even an archiver. that can store data on the backend with ECC encoding, so if there is a damaged sector or a file got corrupted, there is a high chance that it can be repaired. I have used WinRAR in the past, and the recovery record functionality has saved damaged records. Even something like Borg Backup that supports erasure coding, where one can add an additional percentage to the backend repository for ECC, can be something that can save data.

[1] https://www.tomsguide.com/tvs/scientists-just-developed-a-200000gb-optical-disc-that-could-replace-blu-rays

Re: (Score:2)

by nightflameauto ( 6607976 )

> After people started streaming/downloading, work on mainstream optical formats has ceased. Even Sony (AFAIK) doesn't have an optical archiving format anymore.

> This is something businesses need. LTO-9 is okay, but expensive, and optical media done right is relatively cheap to make, can hold just as much, if not [1]more than a tape. [tomsguide.com]

> Long term archiving formats are the cornerstone of any real preservation efforts. Yes, one can always run through tapes and copy the data every few years or stuff everything on a NAS and back that up, but when you start getting into exabytes worth of data, you need to have a solid format, as you don't have the bandwidth to keep rereading the data.

> I just wish optical could get some updates. Even a 5 TB disk would make life a lot easier and make home backups a thing again. People use portable drives for this, but hard drives are not archival media.

> From there, it would be nice to have an open source DAM or even an archiver. that can store data on the backend with ECC encoding, so if there is a damaged sector or a file got corrupted, there is a high chance that it can be repaired. I have used WinRAR in the past, and the recovery record functionality has saved damaged records. Even something like Borg Backup that supports erasure coding, where one can add an additional percentage to the backend repository for ECC, can be something that can save data.

In an age where every query about archival backup is met with, "Just put it in the cloud, dude," I don't think any of us are getting our wish for decent optical backup solutions anytime soon. I've relegated myself to mirrors at home, once a month swapped disks in a firesafe, and *a* cloud backup, but would love to have a real archival option for those monthly / yearly backups. And this is just for shit that I know nobody but me will ever care about. Real data? Forget it. We're too obsessed with profit to gi

[1] https://www.tomsguide.com/tvs/scientists-just-developed-a-200000gb-optical-disc-that-could-replace-blu-rays

Re: We need companies working on optical formats.. (Score:3)

by frdmfghtr ( 603968 )

There is already an optical form of archiving--print.

Archive copies of data should not be working copies. Archive copies are meant to be put somewhere safe so they can be recalled if needed. Want searchable digital copies? Those are working copies, meant to be poked and prodded and searched. If they get corrupted or run into a technological dead end where they can't be converted to a new format, you carefully re-scan the archive copies. The archive copies go back into protective storage and the new digita

Re: (Score:2)

by nightflameauto ( 6607976 )

> There is already an optical form of archiving--print.

> Archive copies of data should not be working copies. Archive copies are meant to be put somewhere safe so they can be recalled if needed. Want searchable digital copies? Those are working copies, meant to be poked and prodded and searched. If they get corrupted or run into a technological dead end where they can't be converted to a new format, you carefully re-scan the archive copies. The archive copies go back into protective storage and the new digital copies go out into the world.

> Printed forms of the data aren't convenient to search that's for sure--but properly made (archive-quality ink and acid-free paper, etc) and preserved they can last centuries. And you don't have to worry about the file format being impossible to read by future technology. We have already seen that digital archives can disappear like a fart in the wind.

> Paper isn't perfect but in my experience proves to be a better archive vehicle of anything that can be preserved on paper than anything digital.

For "words on a page" media, I agree. For my music and video creations, I'd still like optical for archival purposes.

Re: (Score:2)

by test321 ( 8891681 )

Blu-ray disks use a 405 nm diode laser and can write with a spot size down to 150 nm [1]https://en.wikipedia.org/wiki/... [wikipedia.org] To Increase density one needs to use the same tricks as in the semiconductor production, such as very expensive deep UV lasers (excimer ArF 193 nm) or immersion lithography in water or refractive oils. These solutions are not practical for a consumer product.

[1] https://en.wikipedia.org/wiki/Blu-ray#Laser_and_optics

Yawn! (Score:3)

by methano ( 519830 )

The truth is that the "good" publishers like Nature, Science, ACS, Elsevier, etc. are publishing so many journals these days that the average quality is plumitting. This does not even include a lot of lessor journals. If a lot of it rots and disappears, it will be easier to find the good stuff. I'm not fretting.

Re: (Score:3)

by smooth wombat ( 796938 )

The truth is that the "good" publishers like Nature, Science, ACS, Elsevier, etc. are publishing so many journals these days that the average quality is plumitting.

Just like spelling.

Re: (Score:2)

by methano ( 519830 )

Sorry about the spelling. It should be "plummeting". And I felt good about spelling Elsevier correctly.

Re: (Score:2)

by Big Hairy Gorilla ( 9839972 )

yeah, came here to say similar: Cross reference this article with the article here yesterday saying how X% of scientific publishing (in China... but also pretty much everywhere) is fake, false, or otherwise untrue. Just because you publish doesn't make it valuable. I'm guessing there is a lot of "ai" generated content flooding in...

Re: (Score:2)

by Hoi Polloi ( 522990 )

True. But if it gets referenced you'll want to see what was referenced.

just do it wrong again (Score:1)

by invisiblefireball ( 10371234 )

That nobody gives a shit to do the right thing, the only thing that is obviously correct in this situation, is because of capitalism. The actual solution to the problem, doing things correctly in an open source, actually collaborative way using the original internet intention for the internet and not this corporate whorescape we've created, shall forever be out of reach under capitalism, which seeks only to privatize the commons and to prevent obvious, simple, universal and free solutions from being enacte

IP (Score:2)

by Hoi Polloi ( 522990 )

It feels like IP has more rights than human beings now.

Today is the tomorrow you worried about yesterday.