After Copilot trial, government staff rated Microsoft's AI less useful than expected
- Reference: 1739336355
- News link: https://www.theregister.co.uk/2025/02/12/australian_treasury_copilot_pilot_assessment/
- Source link:
The Department conducted a 14-week trial of Microsoft 365 Copilot during 2024 and asked for volunteers to participate. 218 put up their hands and then submitted to surveys about their experiences using Microsoft’s AI helpers. Those surveys are the basis of an [1]evaluation report published on Tuesday.
The report reveals that after the trial participants rated Copilot less useful than they hoped it would be, as it was applicable to fewer workloads than they hoped would be the case.
[2]
Expected and actual proportion of workload participants felt Copilot could/did support: - Click to enlarge
Workers’ views on Copilot’s ability to improve their work also fell.
[3]
Participant ratings of Copilot’s impact on work quality - Click to enlarge
Usage of Copilot was lower than expected, with most participants using it two or three times a week, or less. reported using Copilot 2–3 times per week or less. Treasury thinks it probably set unrealistically high expectations before the trial, and noted that participants often suggested extra training would be valuable.
The trial proposed four use cases for Copilot - generating structured content, supporting knowledge management, synthesising and prioritising information, and undertaking process tasks - and participants agreed they were appropriate. But the report also found they also emerged with the belief that “Copilot was not appropriate for more complex tasks, mostly due to the limitations of the product itself.”
[4]
The tasks participants felt Copilot handled best were “finding and summarising information, generating meeting minutes, knowledge management and drafting content”. The report describes those as “basic administrative tasks”.
[5]
[6]
But saving even a little time on such tasks can pay off: the report finds that if Copilot saves 13 minutes a week for mid-level workers, it will pay for itself.
[7]Microsoft 365 price rises are coming – pay up or opt out (if you can find the button)
[8]You begged Microsoft to be reasonable. Instead it made Copilot reason-able with OpenAI GPT-o1
[9]Why is Big Tech hellbent on making AI opt-out?
[10]Microsoft teases Copilot Vision, the AI sidekick that judges your tabs
Other findings Microsoft will likely appreciate include the unanticipated benefit that Copilot displayed helped “to contribute to accessibility and inclusion for neurodivergent and part-time staff, or those experiencing medical conditions that require time off work.”
The AI assistant did so by producing automatic summaries of missed meetings and “levelling the playing field for those who struggle to navigate workplace norms or culture.” Some staff therefore reported “a small increase in work confidence”, with junior or recent hires more likely to express such sentiments.
Treasury’s learnings from the pilot include more careful selection of staff who use Copilot, the need for more consideration of necessary training on how to use AI and the risks of doing so, and the desirability of ongoing monitoring to test AI’s impact in the workplace.
[11]
Another finding suggests as-a-service AI might not be appropriate for agencies like Treasury.
“While security of protected government data and advice is of upmost importance, ideally the core functions of a generative AI product should work alongside security requirements,” the report states. “It is not clear whether products are likely to evolve over time to meet Treasury’s strict security needs, or whether Copilot itself will continue to evolve to incorporate external information into its outputs without feeding the algorithm with internal Treasury data.”
That opinion suggests orgs that handle sensitive information will likely do better with on-prem AI infrastructure. ®
Get our [12]Tech Resources
[1] https://evaluation.treasury.gov.au/publications/evaluation-generative-artificial-intelligence
[2] https://regmedia.co.uk/2025/02/12/screenshot_australian_treasury_copilot_test_analysis.jpg
[3] https://regmedia.co.uk/2025/02/12/screenshot_australian_treasury_copilot_test_analysis_2.jpg
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Z6x_VjfmiQq7f-id6OBSDwAAAQI&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z6x_VjfmiQq7f-id6OBSDwAAAQI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z6x_VjfmiQq7f-id6OBSDwAAAQI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/02/07/microsoft_365_price_rises/
[8] https://www.theregister.com/2025/01/31/microsoft_open_ai_reasoning_copilot/
[9] https://www.theregister.com/2025/01/23/why_is_ai_optout/
[10] https://www.theregister.com/2024/12/07/microsoft_copilot_vision/
[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z6x_VjfmiQq7f-id6OBSDwAAAQI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[12] https://whitepapers.theregister.com/
It's already helping me to flick off immediate and grossly verbose and confusing replies to official letters, when I used to delay and prevaricate, and thus generally bog down and delay.
At the other end I now have a template reply saying AI-Detector-Autobot has flagged your email as co-pilot generated and moved it to spam. Due to the large number of AI generated emails, we have regretfully been force to block them. (There is of course no such thing)
I can see how the hapless government worker might think that it's not helping them very much.
You are a BOFH and I claim my £5.
Forked tongue oily snake
" if Copilot saves 13 minutes a week [...] it will pay for itself "
I guess the question is whether you're willing to take that paycut of $30 per month (a dollar a day) for those 13 minutes of ecstasy where Copilot miraculously does your work for you ... like an angel from heaven!
If you answer yes, then you can certainly entertain other paycuts as well, for coffee and the coffee machine, snacks, the office fridge, parking space, heating and cooling, office furniture, stationery, office cleaning services, electric outlet use, pencil sharpeners, printing and photocopies, lab supplies, gifts for visitors, publicity, outreach, publication fees, etc ... did I mention a bridge you might avail yourself of at a great price?!
"less useful than expected"
Oh, I don't know. I wouldn't expect anything from it, so if it was actually useful in any way, I would be surprised.
was not appropriate for more complex tasks, mostly due to the limitations of the product itself
In other words, when the hype met reality, the hype turned out to be untrue. Which is the conclusion that pretty much everyone who's used it that I know has also reached.
Look at what they said it actually worked well for : creating summaries of existing information and meeting minutes : all the sort of content creating chores that bureaucratic organisations insist on doing that nobody ever reads again. And only using it 2 or 3 times a week shows that this is an extremely marginal productivity enhancement at best rather than the all singing, all dancing game changer it's being sold as.
Turkeys voting for christmas?
Regardless of whether it’s a good tool or not, I imagine it would be difficult to get a fair evaluation here…
If a government staffer/admin found AI tools largely reduced the need for their job, it would surely be tricky to give it a glowing review?
Treasury Security
Its unclear what the treasury security requirements and concerns are. Ultimately vanilla pre-trained Copilot is just another cloud service sitting in Azure/365 tenant and has the same access levels as every other 365 related service that inherits user permissions.
If your already using 365 or an Azure tenant the physical and logical security risks are unchanged UNLESS you are worried it exposes Security by obscurity issues where joe pleb users has access to finance and hr docs in an open sharepoint - but thats a PR problem not a pure security one.
There are obvious more complliance related concerns concerning appopropriate usage and adoption but thats just change management.
I like that 'more training' was required. Surely AI is marketed as something that doesn't require training.
TIL a new word!
“While security of protected government data and advice is of **upmost** importance, ..."
While Treasury would be using a Copilot contrained to its own tenant, one of the big issues for Copilot deployments outside the area bordering the Gulf of Mexico is systemic bias.
The models training is hugely biased towards US regulatory systems and definitions, and every single tenant outside the US has the onus of retraining or setting metaprompts (e.g. "the tax year runs July-June"). There is no way to have a baseline Australian or French Copilot in this deployment model.