KPMG Wrote 100-Page Prompt To Build Agentic TaxBot (theregister.com)
- Reference: 0178821548
- News link: https://slashdot.org/story/25/08/22/1110257/kpmg-wrote-100-page-prompt-to-build-agentic-taxbot
- Source link: https://www.theregister.com/2025/08/20/kpmg_giant_prompt_tax_agent/
The TaxBot searches distributed internal documents and Australian tax code to generate 25-page draft reports after collecting four to five inputs from tax agents. Chief Digital Officer John Munnelly said the system operates on KPMG Workbench, a global platform combining retrieval-augmented generation with models from OpenAI, Microsoft, Google, Anthropic, and Meta.
[1] https://www.theregister.com/2025/08/20/kpmg_giant_prompt_tax_agent/
Fear is the appropriate response. (Score:3)
The hallucination problem has not been fixed. That means that this tax agent cannot be trusted. The work it produces may be full of inaccuracies that look convincing.
Re:Fear is the appropriate response. (Score:5, Informative)
The hallucination problem _cannot_ be fixed. It is a fundamental part of the mathematical model. Getting it fixed is about as possible as making water not wet under standard conditions.
Re: (Score:1)
> Getting it fixed is about as possible as making water not wet under standard conditions.
Any accounting firm worth their high fee can define "standard conditions" in a way that will make the client happy. If the client wants water that is not wet under "standard conditions," they can make it happen.
Re: (Score:2)
Just as the law (including the tax code) cannot actually be applied using unambiguous logic to all circumstances.
Re: (Score:2)
It seems to me a big problem are small hallucinations, ones that easily slip by humans. Those can compound in the readers head to comprise a big hallucination that will not jump up and down naked on the pages shouting, "Look at me!!"
Re: (Score:2)
Checking work applies as much to calculator handiwork as it does LLM output. Especially in something with high consequences.
How is this an improvement (Score:5, Insightful)
over a specification of the same length implemented by real intelligence instead of a random number generator?
Re: (Score:3)
Or even just a program of the same length. This nonsense proves that some people have not only drunk the AI kool aid, they've filled a pool with it and gone for a swim.
Even if their "prompt" is 100% correct - which is almost certainly won't be - then the AI can still mess up badly with the output. Unfortunately this is what happens when non programmers think programming is easy now and attempt to do it. I guess they'll find out the hard way that hard problems remain hard no matter what pretty wrapper and s
Re: (Score:2)
The only credible scenario is that the "tax application" consists entirely of boilerplate and trivially linkable libraries.
Which must have come into existence before the "agentic" soup and be trivial to assemble.
How, one wonders.
Re: (Score:2)
It is cheaper and they will probably still charge an arm and a leg.
Re: (Score:2)
If it works at all.
Re: (Score:1)
It's an improvement because this is formulaic and easy for a human to proof-read. Instead of choosing from a boilerplate and typing/pasting/selecting, the human can have the whole thing already typed up and probably with the questionable bits highlighted for more careful analysis.
Prompt vs. Training Data (Score:3)
It seems to me that the information like this belongs in training data, not in a prompt. Certainly it requires a very large context window.
Re: (Score:3)
Use of RAG for the data seems appropriate, but the context window issues are very real. I found this to be very informative:
[1]https://github.com/NVIDIA/RULE... [github.com] - "RULER: What’s the Real Context Size of Your Long-Context Language Models?"
TFA doesn't give anything more specific than "100-page prompt". Very rough estimate on size:
* 100 pages * about 250-500 words per page = 25,000 - 50,000 words
* my /usr/share/dict/words is just over 104k words, and is 962K (word frequency being completely ignored here)
*
[1] https://github.com/NVIDIA/RULER
Re: (Score:2)
The way you're analyzing this shows you don't have extensive subject matter expertise. Hint: think in tokens, which varies a bit by model, not bytes.
Traditional two-week timeline? (Score:3)
I am a tax attorney and I have worked with KPMG quite a bit. I've never heard of a two-week timeline as a "traditional" timeline for providing tax guidance. The limiting factor in tax tends to be the development of the facts (usually of an in-progress business deal).
I've also worked with some tax-specific AI tools (not KPMGs). The one I use now uses Chat GPT as a backend. It is useful, but the problem is that it can still get questions completely wrong, so you have to check closely and still read all of the source material it cites. Regular Chat GPT will not consistently cite the law it relies on, so is close to useless.
The main barrier to AI in tax is that most businesses will not and cannot give the AI access to its ERP system to train with. Most systems are still too vulnerable to data leakage. The nightmare scenario is that a third party could get the LLM to spit out proprietary non-public financial information. That barrier isn't insurmountable, but current solutions do not provide sufficient comfort.
There may come a day when most tax compliance is done by AI, but I think we are still some years off before it becomes mainstream. Tax planning will be human- driven for the foreseeable future because pulling the trigger on a particular plan is fundamentally a human judgment call. However, AI would allow the human to dispense with a lot of work and compare different planning options quickly.
Re: (Score:1)
> The main barrier to AI in tax is that most businesses will not and cannot give the AI access to its ERP system to train with.
This shouldn't be hard in principle.
If a company wants to train an AI using its ERP data, clone the AI first, then put the cloned copy under the control of the company that owns the ERP data.
In practice, this may be expensive, but in principle, it doesn't seem had.
AI spell-checker fail (Score:1)
"it doesn't seem had" should read "it doesn't seem hard."
Re: (Score:2)
No different than cloud complaints. Both can be installed locally if needed. Harder part with both is finding the expertise.
Ummm (Score:3)
It sounds like they wrote a program with a programming language that produces fuzzy output in copious quantities.
Good for them, I guess.
Two more weeks (Score:3)
Do they spend an additional two weeks then verifying that all that information is correct and not hallucinated, or is it enough that the information, real or fabricated, is simply presented in a well-written manner?
If a client then acts upon that tax advice and it costs them millions in fines or jail time, is the tax advice company liable? Must not be...
Attention editors (Score:2)
"Agents" do stuff for you. "Bots" and "AI" do stuff for you. You don't need the "agentic" part, we get it the "bot" or "AI" can do stuff.
Re: (Score:2)
"Agentic" is the new way where it runs thru multiple AI models for better results. It's a real term, get familiar with it
Gee that's great (Score:3)
"KPMG Australia developed a 100-page prompt that transforms tax legislation and partner expertise into an agent producing comprehensive tax advice within 24 hours rather than the traditional two-week timeline."
So, certainly they're reducing their FEE for such advice proportionally, yes? I mean, aside from the initial hours (more or less a one-time input), no human time is taken so what would we be paying their $500/hourly rate on, again?
just waiting for the hallucinations (Score:3)
Can't wait until it recommends something illegal and the customer does it.
Re:just waiting for the hallucinations (Score:5, Insightful)
It takes 24 hours to respond, so I assume an actual employee goes over the 25 page draft and it's the actual employee making the recommendation.
Will this firm improve or worse its accurate advice percentage? I don't know, but they will be giving 5x the advice.
Re: (Score:2)
You mean like KPMG and the others usually do via human consultants?
The real problem is that this artificial moron will not know about how to hide the criminal things.
Re: (Score:1)
Recommendation probably gets reviewed by a junior, which means they don't know enough to validate, but then KPMG gets to assign accountability to the human, fire them for "their error," then tweak the bot some more.