News: 1759320546

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

JetBrains wants to train AI models on your code snippets

(2025/10/01)


IDE and developer tools biz JetBrains believes training AI models on public datasets is insufficient, and is offering free product licenses to organizations that are willing to share detailed code-related data.

The data that will be collected includes code snippets, prompt text, AI responses, edit history, and terminal usage.

The new data-sharing dialog includes agreeing to model training on code snippets

"That sounds like a lot, and it is, but that's where the real value for improvements comes from," said the official post on a [1]new approach to data collection in JetBrains IDEs.

JetBrains argues that most AI coding models are trained on public code that do not reflect the "complex, real-world scenarios" of professional development, and insists that it needs data on real usage in order to provide what is needed. A trial based on collecting internal coding data has been promising, the post states, but needs to be scaled to external users.

The company is offering a substantial incentive to organizations that are happy to hand over their data – free All Products Pack subscriptions for one year for employees, currently priced at $979.00 per user/year. There is a [2]waitlist and the offer is described as limited.

[3]

The changed data sharing options are set to land in the 2025.2.4 versions of the JetBrains series of IDEs, expected in around two weeks' time, and including IntelliJ IDEA, PyCharm, Rider, RubyMine, and PhpStorm. The new setting for sharing detailed code-related data specifically states that the data will be used for model training purposes. In some cases, such as for non-commercial users, this data sharing will be enabled by default. The setting will be opt-in for those with commercial licenses, and also off by default for centrally managed organization licenses.

[4]

[5]

Use of code for model training is uncomfortable both in respect of accidental sharing of intellectual property and, in the worst case, [6]regurgitation of code , as experienced in 2022 by developer Tim Davis.

JetBrains is the most popular tools vendor after Microsoft and the provider of Google's Android Studio, though we note that the current Android Studio invites developers to share data with Google rather than JetBrains, from which we conclude that it is excluded from the forthcoming changes.

[7]

Although JetBrains offers its own AI coding agent called Junie, the company also claims to offer a "multi-agent ecosystem" following the [8]integration of Claude Agent in its range of IDEs, built on the Anthropic's Agent SDK and including support for the newly released Claude 4.5 Sonnet LLM (large language model). There is a question, therefore, of whether it makes sense for JetBrains to attempt to compete with the likes of Anthropic and OpenAI by training its own models, rather than focusing on its developer tools alone.

[9]Strong Java LTS arrives with the release of 25

[10]Gitpod reinvents itself as Ona in pivot to AI agent platform

[11]'Huge architectural change' to JetBrains ReSharper cuts Visual Studio freezes

[12]Vibe coding platform Anything arrives, our hands-on suggests caution

Junie was well received on first launch but users now complain that it is too expensive, following a new AI quota model introduced in August. "Was good but now, way more expensive than anything else on the market," [13]a user complained two days ago, while another protested that "unless you can stomach adding top-up credits every hour, this is not a fit for you."

The company [14]responded that "there's no scam here. We're aligning usage to real, public provider prices per token." The new plan is a sustainable foundation, said Jetbrains head of marketing Ilya Petrov, adding that AI usage cannot be a flat fee and is unpredictable, depending even on the exact phrasing of a prompt.

The All Product licenses on offer include an AI Pro subscription, but this only covers 10 AI credits per month, with each credit worth $1.00, so even companies taking advantage of free licenses in return for data sharing are likely to face unpredictable additional costs for token usage. ®

Get our [15]Tech Resources



[1] https://blog.jetbrains.com/blog/2025/09/30/detailed-data-sharing-for-better-ai/

[2] https://www.jetbrains.com/lp/data-collection-program-for-organizations/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aN1QE_BCKIK3zPZ6F9bUswAAANM&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aN1QE_BCKIK3zPZ6F9bUswAAANM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aN1QE_BCKIK3zPZ6F9bUswAAANM&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://x.com/DocSparse/status/1581461734665367554

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aN1QE_BCKIK3zPZ6F9bUswAAANM&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[8] https://blog.jetbrains.com/ai/2025/09/introducing-claude-agent-in-jetbrains-ides/

[9] https://www.theregister.com/2025/09/17/java_25_released/

[10] https://www.theregister.com/2025/09/03/gitpod_rebrands_as_ona/

[11] https://www.theregister.com/2025/09/03/jetbrains_resharper_update/

[12] https://www.theregister.com/2025/08/14/anything_vibe_coding_platform_released/

[13] https://plugins.jetbrains.com/plugin/26104-jetbrains-junie/reviews

[14] https://blog.jetbrains.com/ai/2025/09/faq-new-ai-quota/

[15] https://whitepapers.theregister.com/



CICO

elsergiovolador

So if someone cannot make a living with their code, they'll get privilege to train AI with it?

No

Admiral Grace Hopper

See title.

Fonant

JetBrains think that people will PAY MONEY to have an LLM come up with some code that looks plausible, but is based on statistical analysis of an invisible mass of code from unknown third parties?

Please can the "AI" bubble burst sooner rather than later?

Glad I don't use Rider

Irongut

Or any JetBrains products.

The best response to this

cyberdemon

Would be to run two accounts: One full of AI-generated dross, which is willingly ingested in exchange for a free additional license, (and happily proceeds to poison the resultant model). Then with the free additional license, turn off all cloud features. No slurping the real code

Re: The best response to this

geoffbeaumont

I suspect that won't work - your free license will be required to having slurping enabled. Which makes it pointless getting it if you aren't prepared to let those tools near your code.

I can't see why anyone would, unless they're a hobbyist playing around with code that's public domain anyway (or a fully open source developer - same, JetBrains can already access their code). If you have proprietary code you've presumably kept it proprietary for a reason - the last thing you want is an AI used by other developers trained to reproduce your code! And if you ever access anyone else's code under NDA or license restrictions (which, lets face it, affects most closed source developers) then you legally can't sign up to this without the agreement of those 3rd parties...

Rubbish in, rubbish out

trevorde

Judging some of the codebases I've worked on, and code I've written, I definitely wouldn't want *anything* trained/based on them!

Everyone here is complaining about the wrong thing

tamegeek42

If you are a paying client not only data collection is turned off by default, but you can disable the AI features either on a per-project basis, or with a global killswitch which surgically removes the AI assistant from the IDE. They expose this information upfront in the configuration page. So, for the use case of paying professional clients that everyone here is incensed about it makes zero difference whatsoever.

The worrying bit lies in this phrase "In some cases, such as for non-commercial users, this data sharing will be enabled by default". The non-commercial licenses are Community editions (e.g. PyCharm CE), early access program users, and free non-commercial licenses given to qualifying FOSS projects' contributors. The whole "if you're not paying for the product the YOU are the product" approach is very icky and looks very self-defeating to me. It turns people away from the product, and they get to train their models on code written by people who are not doing professional development. Ugh. Um. Okay, then.

If it smells it's chemistry, if it crawls it's biology, if it doesn't work
it's physics.