Boston Public Library Aims To Increase Access To a Vast Historic Archive Using AI
- Reference: 0178657694
- News link: https://news.slashdot.org/story/25/08/12/2050216/boston-public-library-aims-to-increase-access-to-a-vast-historic-archive-using-ai
- Source link:
> [1]Boston Public Library , one of the oldest and largest public library systems in the country, is launching a project this summer with OpenAI and Harvard Law School to [2]make its trove of historically significant government documents more accessible to the public . The documents date back to the early 1800s and include oral histories, congressional reports and surveys of different industries and communities. "It really is an incredible repository of primary source materials covering the whole history of the United States as it has been expressed through government publications," said Jessica Chapel, the Boston Public Library's chief of digital and online services. Currently, members of the public who want to access these documents must show up in person. The project will enhance the metadata of each document and will enable users to search and cross-reference entire texts from anywhere in the world. Chapel said Boston Public Library plans to digitize 5,000 documents by the end of the year, and if all goes well, grow the project from there. Because of this historic collection's massive size and fragility, getting to this goal is a daunting process. Every item has to be run through a scanner by hand. It takes about an hour to do 300-400 pages.
>
> Harvard University said it could help. Researchers at the Harvard Law School Library's Institutional Data Initiative are working with libraries, museums and archives on a number of fronts, including training new AI models to help libraries enhance the searchability of their collections. AI companies help fund these efforts, and in return get to train their large language models on high-quality materials that are out of copyright and therefore less likely to lead to lawsuits. "Having information institutions like libraries involved in building a sustainable data ecosystem for AI is critical, because it not just improves the amount of data we have available, it improves the quality of the data and our understanding of what's in it," said Burton Davis, vice president of Microsoft's intellectual property group. [...] OpenAI is helping Boston Public Library cover such costs as scanning and project management. The tech company does not have exclusive rights to the digitized data.
[1] https://www.bpl.org/about-the-bpl/
[2] https://www.npr.org/2025/08/11/nx-s1-5471614/boston-public-library-harvard-ai
Better hurry (Score:3)
Before the fascist goon makes them rewrite history [1]to fit his agenda [cbsnews.com].
[1] https://www.cbsnews.com/news/white-house-review-smithsonian-museums-trump/
Re: (Score:2)
eh, so Trump re-writes history, and fakes the GDP, inflation, and jobs numbers for awhile. I am still optimistic that in a year and a half, the other party will get control of the house, and I think Trump will be "toast" then. Then we are looking at 2 years of paralysis... then maybe someone better will come along that can move the US forward, instead of backwards.
Re: (Score:2)
(Offtopic)
Of course... that's if the 'other party' has a worthy candidate.
(On topic)
Why does this need AI? I could see maybe converting a scanned page or book to a text file that a computer can search through... but Adobe has been able to do that for quite a while (OCR).
"AI" is not the answer to everything. Humans (last I checked, anyway) are capable of reading books and researching stuff.
Re: (Score:2)
(On Topic) I find that AI has a particular ability to summarize information, and an ability to answer queries about information. Of course they "hallucinate", and are wrong sometimes. I think it is important to drill down, ask questions about the sources, and double check them, but it is much faster than reading all the books yourself, and the human mind misses things, or "hallucinates" too.
Re: (Score:2)
Of course it does... it removes the extraneous crap and gives you a few highlights... what about the in-between information?
My mind has never hallucinated anything... the closest approximation would be dreams. Not that any medical establishment has any clue about the whole mind/brain thing... the 'soul' is still a mystery (even though people lose a couple dozen grams when they die... they don't know why).
If I want to know something (say 'Chernobyl'), I'll read the first article that seems like it has info,
Re: (Score:2)
"Can "AI" make me a 3D model to take to the resin printer for a specific 'thing'? Can it write a book for me? Can it edit a whole video for me? If 'no... what is it's purpose?"...
I would say yes. People are spitting out AI books and flooding the market with them. They may or may not be able to make a 3D model yet, I don't know. I know they can do the basics of SW programming well, and their syntax is almost perfect. It can edit or even create a video based on specific verbal instructions for anybo
Re: (Score:2)
Yeah... and, are those "AI" books worth taking 5 minutes to glance at? I would much rather read a good book written by Michael Crichton (or whoever), than something written "in the style of Crichton".
What about the potential authors (the humans) who are trying to be authors? Should they lose their livelihood because of AI? What about the people who write movies or even act in movies? Should they be unemployed because of AI?
"AI" can't make the 3D model for me because it can't understand that I want to ma
Re: (Score:2)
I hear you, and I respect your point of view. I think a lot of our thoughts align.
Re: (Score:2)
Thank you... that's rare on here.
The "AI" rush is ignoring the fact of what it actually is... it's the same thing that your phone does when you text someone... predictive text.
If you stuff it (the hardware and crap) in a closet without 'net connectivity... can it solve the problem of how to get out?
If it can't (consistently... run the test a dozen times), is it intelligent?
Sure... if I give it access to my Arduino code stores, it can reference the code pile and make something following the standards.
If I as
Re: (Score:2)
I hove no delusions about "AI". I tried to get it to write code for me on the RP2040 using their PIO's so I could get 8 channels of PWM waveforms with 8ns precision. The RP2040 has PWM generators, and I looked into them for this purpose, but it seemed to not do the job. I got the AI to write the code to interface between the serial port and my computer, tried to get it to write the other stuff. Ultimately, it wrote me a skeleton and I did the rest. It did accelerate my time, I think, but ultimately
Who will own the results? (Score:2)
Of course, all those books Google digitized are now behind Google's login.
Re: (Score:2)
Well... it's Google or a competitor... can't think of a competitor other then M$
It depends on what that scanned book/article is behind... depending on copyright, it should (maybe has to) be available to public.
Whether it's Google or ArsTechnica or whoever... the actual thing needs to be no excuse public, especially for rare stuff.
Paywalled articles about new drugs or science papers shouldn't exist... maybe do a "publish the article, but contact info is paid-for" or something.
For the people "complaining" abo
I have approximate knowledge of many things (Score:1)
Oh wonderful. In exchange for company access to the entire library catalogue, the public gets a shitty semantic search engine that can only work by probabilistically stringing words together. This will do nothing for people who need factual information.
Re: (Score:2)
I think a researcher could ask the AI interesting questions, and ask for a direct quote.. and then get it. The AI should be able to give the page and paragraph number of the book. It can then be verified. This is the outcome I imagine.
Re: (Score:2)
Sure... you can ask the "AI" to directly quote an ArsTechnica article... and it'll give you what you want. You entirely could go to the site itself and read the article.... what did the "AI" do for you?
It's a very basic search engine, that replies using (more or less) "plain text", and I'm sure there's a lot of censorship and filtering built into it.
I can type something into a search engine and review the results, and click on the closest relative to what I wanted to find.
Of course, I can find my article a
Re: (Score:2)
I think that you are boiling down what the current state of AI is: a search engine. That is what the point of the article is. In the future, I am hoping that it can do an emulation of understanding physics and material science, and can actually design rockets, computer chips, and robots.
I heard this story on NPR today.. (Score:3)
It seems like a decent use of AI, and of the profits that the companies are generating from AI. Seems like a win... win..
Re: (Score:2)
Yes, finally a sensible use is being reported here. LLMs are pretty good at summation and categorization. I have a pretty big media library I'm thinking of using local models to categorize. Since it doesn't much matter how long it takes, I can underclock everything and let it just plod along.