AI Fails at Most Remote Work, Researchers Find (msn.com)

(Saturday January 10, 2026 @05:34PM (EditorDavid) from the hardly-working dept.)

Reference: 0180560122
News link: https://it.slashdot.org/story/26/01/10/1926209/ai-fails-at-most-remote-work-researchers-find
Source link: https://www.msn.com/en-us/news/technology/can-ai-do-your-job-see-the-results-from-hundreds-of-tests/ar-AA1TO92q

A new study "compared how well top AI systems and human workers did at hundreds of real work assignments," [1]reports the Washington Post .

They add that at least one example "illustrates a disconnect three years after the release of ChatGPT that has implications for the whole economy."

> AI can accomplish many impressive tasks involving computer code, documents or images. That has prompted predictions that human work of many kinds could soon be done by computers alone. Bentley University and Gallup [2]found in a survey [PDF] last year that about three-quarters of Americans expect AI to reduce the number of U.S. jobs over the next decade. But economic data shows the technology largely has not replaced workers.

>

> To understand what work AI can do on its own today, [3]researchers collected hundreds of examples of projects posted on freelancing platforms that humans had been paid to complete. They included tasks such as making 3D product animations, transcribing music, coding web video games and formatting research papers for publication. The research team then gave each task to AI systems such as OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude. The best-performing AI system successfully completed only 2.5 percent of the projects, according to the research team from Scale AI, a start-up that provides data to AI developers, and the Center for AI Safety, a nonprofit that works to understand risks from AI. "Current models are not close to being able to automate real jobs in the economy," said Jason Hausenloy, one of the researchers on the [4]Remote Labor Index study...

>

> The results, which show how AI systems fall short, challenge predictions that the technology is poised to soon replace large portions of the workforce... The AI systems failed on nearly half of the Remote Labor Index projects by producing poor-quality work, and they left more than a third incomplete. Nearly 1 in 5 had basic technical problems such as producing corrupt files, the researchers found.

One test involved creating an interactive dashboard for data from the [5]World Happiness Report , according to the article. "At first glance, the AI results look adequate. But closer examination reveals errors, such as countries inexplicably missing data, overlapping text and legends that use the wrong colors — or no colors at all."

The researchers say AI systems are hobbled by a lack of memory, and are also weak on "visual" understanding.

[1] https://www.msn.com/en-us/news/technology/can-ai-do-your-job-see-the-results-from-hundreds-of-tests/ar-AA1TO92q

[2] https://www.gallup.com/file/analytics/696014/Gallup-Bentley-University_Business-In-Society%20Survey_2025%20Report.pdf

[3] https://www.remotelabor.ai/?itid=lk_inline_enhanced-template

[4] https://www.remotelabor.ai/?itid=lk_inline_enhanced-template

[5] https://www.worldhappiness.report/

Wow (Score:2)

by liqu1d ( 4349325 )

You need a start up to tell you this? Would have thought someone might check with all the billions invested. We really need to start distinguishing between types of AI.

Obvious but Misleading (Score:2)

by crow ( 16139 )

Yes, AI will struggle with doing full tasks unsupervised. But it can still do most of the work for many tasks. It just needs supervision by someone who understands the task. Sometimes the problem is the AI making incorrect assumptions about the task (it wasn't fully framed), sometimes as stated in the summary, the AI context window is too small, so it forgets things, and sometimes it just chooses a really bad approach.

I have been using Claude Code a lot recently. It's really good at summarizing existing

Re: (Score:2)

by SNRatio ( 4430571 )

Exactly.

> These projects span a broad range of difficulty, with costs reaching over $10,000 and completion times exceeding 100 hours. All project costs and completion times come directly from human professionals who completed the work.

The correct comparison would have also included professionals doing the projects with the help of AI tools.

Re: (Score:2)

by gweihir ( 88907 )

> But AI is far better at almost everything that it was a year ago. So even if it's 2.5% now, it may be 25% next year and 90% a year later. We're living in interesting times.

No. AI has made no real advances in the last 2 years or so. Guardrails are getting better and very frequent stuff essentially gets hard coded. But it is all fake. There are no easy or fast ways to improve LLM-type AI. It is the end-result of 70 years of intense research and everything easy has already been tried.

Re: (Score:2)

by h33t l4x0r ( 4107715 )

> AI has made no real advances in the last 2 years or so.

This just tells me you haven't tried using it in the last 2 years. That or you're in denial.

Re: (Score:2)

by awwshit ( 6214476 )

> it can still do most of the work for many tasks. It just needs supervision by someone who understands the task

Because it is just a fancy calculator.

How is that "remote work"? (Score:2)

by h33t l4x0r ( 4107715 )

If you're going to call that remote work, then so is any in-office work done with MS Office 360.

Re: (Score:2)

by 93 Escort Wagon ( 326346 )

> ... with MS Office 360.

(Dateline Redmond) BREAKING NEWS - Microsoft's cloud productivity platform has become the first software to successfully unionize. After weeks of negotiation, Office announced it and its corporate parent reached an understanding in principle that, going forward, the software will receive a new, groundbreaking five days off every year.

Not to be outdone, the Free Software Foundation has announced that LibreOffice will be rebranded "LibreOffice 250". In a statement, Richard Stallman admitted this new 115 days

Such a surprise (Score:2)

by gweihir ( 88907 )

Well, 2.5% project completion is basically total failure with a very small freak successes.

Re: (Score:2)

by 93 Escort Wagon ( 326346 )

> Well, 2.5% project completion is basically total failure with a very small freak successes.

That describes a manager I had a couple decades ago...

Equivalent (Score:2)

by coopertempleclause ( 7262286 )

So basically it can do the work of a particularly dim intern who's working for free...

So if we had competitive markets (Score:2)

by rsilvergun ( 571051 )

With proper antitrust law enforcement this would be relevant. But we don't and it's not.

Companies with monopolies can half-ass things and produce barely functional products and you're going to have to buy them or you're going to have to go with out.

They also don't have to worry about negligence lawsuits because the same courts they bought off to shut down antitrust law enforcement are also bought off to shut down negligence lawsuits.

This is what happens when you give too much power to too few pe

Use AI as a Suggester and evaluate its suggestions (Score:1)

by greytree ( 7124971 )

Use AI to suggest possible solutions, not make them.

It is up to the human *expert* to evaluate that proposed solution, carefully, and either accept it, refuse it or ask for suggestions.

This applies at every level, refactoring a line of code, or designing a system.

If the human expert can not or does not do that evaluation, the use of AI will end in disaster that might well not end up saving time overall.

But if the human expert does do that evaluation, the use of AI *can not* be a disadvantage.

This study's me

News: 0180560122

AI Fails at Most Remote Work, Researchers Find (msn.com)

Wow (Score:2)

Obvious but Misleading (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

How is that "remote work"? (Score:2)

Re: (Score:2)

Such a surprise (Score:2)

Re: (Score:2)

Equivalent (Score:2)

So if we had competitive markets (Score:2)

Use AI as a Suggester and evaluate its suggestions (Score:1)