Apple: Since you care about yOuR pRiVaCy, we'll train our AI on made-up emails

(2025/04/16)

Reference: 1744788547
News link: https://www.theregister.co.uk/2025/04/16/apple_ai_training_privacy/
Source link:

Apple, having starved its AI models of data by respecting customer privacy, plans to improve its chatbot suggestions by using made-up emails.

The iGiant says it will soon start using synthetic data – that is, data generated by computers instead of actual humans – to improve email summaries generated by Apple Intelligence for those who have opted into [1]Device Analytics .

This ask-for-permission approach contrasts sharply with social media giant Meta, which recently said it will [2]resume training its AI models on the posts produced by users in Europe unless they opt out.

[3]

Apple is using an undisclosed large language model to invent email messages on various topics. As an example, the Mac daddy cites the message, “Would you like to play tennis tomorrow at 11:30AM?”

[4]

[5]

By generating variations on this message using an AI model and converting these into embeddings – a vector math representation – Apple can then use a technique called [6]differential privacy [PDF] to compare the synthetic embeddings to embeddings derived from actual email messages from opted-in users, without revealing the contents of the genuine messages. This helps make the training data as close to the real thing as possible.

"Synthetic data are created to mimic the format and important properties of user data, but do not contain any actual user generated content," Apple explains in [7]a post to its machine learning research site.

[8]

"When creating synthetic data, our goal is to produce synthetic sentences or emails that are similar enough in topic or style to the real thing to help improve our models for summarization, but without Apple collecting emails from the device," it says."This process allows us to improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy."

Synthetic data is widely used in AI training but has [9]several disadvantages , including potential bias, incompleteness, inaccuracies, and model performance, among others.

At the same time, it's private – it's highly unlikely that a model trained on invented information will emit valid personal data in response to a prompt. One hopes the LLM training Apple's AI isn't leaking personal info it may have picked up during its own training into Cupertino's neural networks.

[10]Meta to feed Europe's public posts into AI brains again

[11]Dead or alive, Britain hands Schrödinger's industry £121M

[12]Ireland opens probe into Musk's X over Grok's AI data slurp

While Apple's approach has afforded customers a level of privacy only [13]grudgingly granted by rivals , it has also denied the iPhone maker training data that might have made Apple Intelligence more competitive. The biz was [14]sued last month for exaggerating its AI capabilities, and anecdotally, it appears [15]there's [16]room [17]for improvement .

Apple is already using this technique to improve text generation within email messages in its beta software. ®

Get our [18]Tech Resources

[1] https://www.apple.com/legal/privacy/data/en/device-analytics/

[2] https://www.theregister.com/2025/04/15/meta_resume_ai_training_eu_user_posts/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Z_9_yAjfcFWOMGyVxsnm9gAAAI8&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z_9_yAjfcFWOMGyVxsnm9gAAAI8&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z_9_yAjfcFWOMGyVxsnm9gAAAI8&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf

[7] https://machinelearning.apple.com/research/differential-privacy-aggregate-trends

[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z_9_yAjfcFWOMGyVxsnm9gAAAI8&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[9] https://arxiv.org/html/2401.01629v1

[10] https://www.theregister.com/2025/04/15/meta_resume_ai_training_eu_user_posts/

[11] https://www.theregister.com/2025/04/15/uk_quantum_funding/

[12] https://www.theregister.com/2025/04/14/ireland_investigation_into_x/

[13] https://www.theregister.com/2025/04/11/microsoft_windows_recall/

[14] https://www.theregister.com/2025/03/21/apple_hallucinated_siri_ai_features/

[15] https://www.reddit.com/r/ios/comments/1i5lq3s/apple_intelligence_cant_even_summarise_apple/

[16] https://www.reddit.com/r/iphone/comments/1hgdlhh/apple_intelligence_prioritizing_phishing_emails/

[17] https://www.reddit.com/r/AppleIntelligenceFail/

[18] https://whitepapers.theregister.com/

How do you do, fellow kids?

Philip Storry

On the one hand, kudos for caring about privacy. On the other hand, way to self-own.

Check out their suggested email: "Would you like to play tennis tomorrow at 11:30AM?"

Not only have I never sent or received such a message, but the few people I know that play tennis never have either. It's a ham-fisted, dry, asinine and tone-deaf attempt at mimicking human communication.

"Tennis tomorrow at 11:30?", or "Tennis tomorrow, usual time?" or "Still on for tennis tomorrow?" would be much more... human. WPOR in War Games had a better script than the one that Apple is ascribing to its users.

Seriously, Apple, just make training an opt-in. People that want AI give up their data, and accept that the training has potential privacy implications. But they at least get it trained on data that reflects their circumstances and behaviour.

And people that don't want AI just don't opt in. For whatever reason.

But training on Apple's "FBI Agent pretending to be a cool criminal" text corpus won't get anyone anywhere. So just don't do it.

Re: How do you do, fellow kids?

MonkeyJuice

In Apple's favour here, say someone is opted in, and a friend emails them who has not opted in. Just the act of sending an email to a trusted address has now violated their privacy expectations. AI is a giant sponge that soaks up everything around it indiscriminately.

Reality following fiction

Charlie Clark

t's highly unlikely that a model trained on invented information will emit valid personal data in response to a prompt.

I suggest the author read Graham Green's Our Man in Havana where a Mr Wormold, a vacuum clean salesman in Hava who, having been recruited by British intelligence, proceeds to invent agents in order to collect their pay, including Sanchez a pilot. Then he reads of the mysterious crash of a plane in the mountains: the pilot is a certain Mr Sanchez…

Doctor Syntax

Art imitating life again. Many internet users have been training themselves on made up data for years.

From the BBC's PoV does it mean that it will be bolloxing up summaries of its own invented news reports instead of the Beebs?

News: 1744788547

Apple: Since you care about yOuR pRiVaCy, we'll train our AI on made-up emails

How do you do, fellow kids?

Re: How do you do, fellow kids?

Reality following fiction