Salesforce study finds LLM agents flunk CRM and confidentiality tests
- Reference: 1750079951
- News link: https://www.theregister.co.uk/2025/06/16/salesforce_llm_agents_benchmark/
- Source link:
A team led by Kung-Hsiang Huang, a Salesforce AI researcher, showed that using a new benchmark relying on synthetic data, LLM agents achieve around a 58 percent success rate on tasks that can be completed in a single step without needing follow-up actions or more information.
Using the benchmark tool CRMArena-Pro, the team also showed performance of LLM agents drops to 35 percent when a task requires multiple steps.
[1]
Another cause for concern is highlighted in the LLM agents' handling of confidential information. "Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance," a [2]paper published at the end of last month said .
[3]
[4]
The Salesforce AI Research team argued that existing benchmarks failed to rigorously measure the capabilities or limitations of AI agents, and largely ignored an assessment of their ability to recognize sensitive information and adhere to appropriate data handling protocols.
[5]BT chief says AI could deliver more job cuts, hints at Openreach sell-off
[6]Put Large Reasoning Models under pressure and they stop making sense, say boffins
[7]The launch of ChatGPT polluted the world forever, like the first atomic weapons tests
[8]Enterprise AI adoption stalls as inferencing costs confound cloud customers
The research unit's CRMArena-Pro tool is fed a data pipeline of realistic synthetic data to populate a Salesforce organization, which serves as the sandbox environment. The agent takes user queries and decides between an API call or a response to the users to get more clarification or provide answers.
"These findings suggest a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios," the paper said.
The findings might worry both developers and users of LLM-powered AI agents. Salesforce co-founder and CEO Marc Benioff told investors last year that AI agents represented " [9]a very high margin opportunity " for the SaaS CRM vendor as it takes a share in efficiency savings accrued by customers using AI agents to help get more work out of each employee.
[10]
Elsewhere, the UK government has said it would [11]target savings of £13.8 billion ($18.7 billion) by 2029 with a digitization and efficiency drive that relies, in part, on the adoption of AI agents.
AI agents might well be useful, however, organizations should be wary of banking on any benefits before they are proven. ®
Get our [12]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aFA_lPzqMKv2VkZm9X1n9wAAAc8&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://arxiv.org/pdf/2505.18878
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aFA_lPzqMKv2VkZm9X1n9wAAAc8&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aFA_lPzqMKv2VkZm9X1n9wAAAc8&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://www.theregister.com/2025/06/16/bt_chief_says_ai_could_cut_more_staff/
[6] https://www.theregister.com/2025/06/16/opinion_column_lrm/
[7] https://www.theregister.com/2025/06/15/ai_model_collapse_pollution/
[8] https://www.theregister.com/2025/06/13/cloud_costs_ai_inferencing/
[9] https://www.theregister.com/2024/08/29/salesforce_pricing_per_ai_conversation/
[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aFA_lPzqMKv2VkZm9X1n9wAAAc8&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[11] https://www.theregister.com/2025/06/12/nhs_tech_spending_review/
[12] https://whitepapers.theregister.com/
LLM-based AI agents fail to undertand....anything!
LLM-based AI agents are inference machines., they have no grasp whatsoever of understanding.
Re: LLM-based AI agents fail to undertand....anything!
"they have no grasp whatsoever of understanding."
Sounds like 70%-80% of the IT support folks I've encountered in the past few years.
Most conversations seem to miss the detail that AI implementations are not about providing an effective solution that augments or compliments a human-driven service, it's cutting costs as deep as you can without the threat of litigation rendering it a net-negative.
It doesn't matter to investors and shareholders if the solution improves anything, works well long-term or even functions at all, as long as the facade doesn't crumble before the line you care about has finished going up prior to the next earnings call. The "AI solutions" contractors are just selling plastic shovels in the gold rush to the most gullible boardrooms salivating for workforce reduction opportunities.
fail to understand
See title.
Could this test be adapted to outsourced customer service teams?
This is the Salesforce that had recently announced was going to replace a lot of its staff with AI agents, yes?