News: 0175280093

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Cheap AI 'Video Scraping' Can Now Extract Data From Any Screen Recording (arstechnica.com)

(Friday October 18, 2024 @05:30PM (BeauHD) from the new-tricks-of-the-trade dept.)


An anonymous reader quotes a report from Ars Technica:

> Recently, AI researcher Simon Willison wanted to add up his charges from using a cloud service, but the payment values and dates he needed were scattered among a dozen separate emails. Inputting them manually would have been tedious, so he turned to a technique he calls "video scraping," which involves feeding a screen recording video into an AI model, similar to ChatGPT, for data extraction purposes. What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we're doing on our computer screens.

>

> "The other day I found myself needing to add up some numeric values that were scattered across twelve different emails," Willison wrote in a [1]detailed post on his blog. He recorded a 35-second video scrolling through the relevant emails, then fed that video into Google's AI Studio tool, which allows people to experiment with several versions of Google's Gemini 1.5 Pro and Gemini 1.5 Flash AI models. Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use. After double-checking for errors as part of his experiment, the [2]accuracy of the results -- and what the video analysis cost to run -- surprised him.

>

> "The cost [of running the video model] is so low that I had to re-run my calculations three times to make sure I hadn't made a mistake," he wrote. Willison says the entire video analysis process ostensibly [3]cost less than one-tenth of a cent , using just 11,018 tokens on the Gemini 1.5 Flash 002 model. In the end, he actually paid nothing because Google AI Studio is currently free for some types of use.



[1] https://simonwillison.net/2024/Oct/17/video-scraping/

[2] https://cloud.google.com/blog/products/ai-machine-learning/the-needle-in-the-haystack-test-and-how-gemini-pro-solves-it

[3] https://arstechnica.com/ai/2024/10/cheap-ai-video-scraping-can-now-extract-data-from-any-screen-recording/



I do similar things (Score:3)

by SirSlud ( 67381 )

I take screenshots of a bunch of web pages and then just describe to the MML what it's looking at, and how I'd like it combined, arranged, formatted (in markdown, to boot) It's rather impressive how well it gets stuff like that right off the bat. Took a task I used to hate to do, now it takes me a 1/10th of the time, if that. It wouldn't surprise me it works equally well with video, although maybe how cheap it is to do is notable.

Looks like a tool for the incapable (Score:2)

by gweihir ( 88907 )

Obviously, you sometimes simply will get a wrong result on top as a bonus. I mean, we are now using "AI" to add numbers?

Re: (Score:2)

by fahrbot-bot ( 874524 )

> Obviously, you sometimes simply will get a wrong result on top as a bonus. I mean, we are now using "AI" to add numbers?

Reminds me of the Google analytics chart showing how many people asked "What's the number for 911?" -- which apparently wasn't a joke.

Not news??? (Score:2)

by Kelxin ( 3417093 )

This has been happening for over a year. Let me know when AI can watch porn with me and suggest new models in similar tastes.

General notions are generally wrong.
-- Lady M. W. Montagu