Claude AI Finds Bugs In Microsoft CTO's 40-Year-Old Apple II Code (theregister.com)
- Reference: 0180945426
- News link: https://it.slashdot.org/story/26/03/10/0521258/claude-ai-finds-bugs-in-microsoft-ctos-40-year-old-apple-ii-code
- Source link: https://www.theregister.com/2026/03/09/claude_legacy_code_vulns/?td=rt-3a
> AI can reverse engineer machine code and find vulnerabilities in ancient legacy architectures, says Microsoft Azure CTO Mark Russinovich, who [1]used his own Apple II code from 40 years ago as an example . Russinovich [2]wrote : "We are entering an era of automated, AI-accelerated vulnerability discovery that will be leveraged by both defenders and attackers."
>
> In May 1986, Russinovich wrote a utility called Enhancer for the Apple II personal computer. The utility, written in 6502 machine language, added the ability to use a variable or BASIC expression for the destination of a GOTO, GOSUB, or RESTORE command, whereas without modification Applesoft BASIC would only accept a line number. Russinovich had Claude Opus 4.6, released early last month, look over the code. It decompiled the machine language and found several security issues, including a case of "silent incorrect behavior" where, if the destination line was not found, the program would set the pointer to the following line or past the end of the program, instead of reporting an error. The fix would be to check the carry flag, which is set if the line is not found, and branch to an error.
>
> The existence of the vulnerability in Apple II type-in code has only amusement value, but the ability of AI to decompile embedded code and find vulnerabilities is a concern. "Billions of legacy microcontrollers exist globally, many likely running fragile or poorly audited firmware like this," said one comment to Russinovich's post.
[1] https://www.theregister.com/2026/03/09/claude_legacy_code_vulns/?td=rt-3a
[2] https://www.linkedin.com/posts/markrussinovich_opus-46s-security-audit-of-my-1986-code-activity-7436235669938614272-IV5f
Security Theater (Score:3)
"It decompiled the machine language and found several security issues"
Security issues on an Apple II? It's difficult to imagine what kind of "security" they think is possible on an Apple II.
Re: (Score:2)
So, for the open ended general purpose of a platform without the concept of privilege separation, you are right, and that's realistically where Apple II sits.
But what if you had a similarly loose platform but it's running a kiosk and that kiosk software is purportedly designed to keep the user on acceptable rails. Then finding a way to break that kiosk software might be significant.
So I'll grant that the concept *could* map to real-world concerns, given how wild west a lot of embedded applications have bee
Good example of why it's wrong (Score:3)
> But what if you had a similarly loose platform but it's running a kiosk and that kiosk software is purportedly designed to keep the user on acceptable rails.
There is a lot of leverage done by the "similarly".
Apple's computers run on 6502.
This was an insanely popular architecture. It's been used in metric shit tons of other hardware from roughly that era. There are insane amounts of resource about this architecture. It was usually programmed in assembly. There has been a lot of patching of binaries back then. These CPUs have also been used in courses and training for a very long time, most of which are easy to come by. So there's an insane amount of material ab
Re: (Score:2)
You have a fair point that the selection of a 40 year old 6502 application is interesting, and likely driven by the reality that the LLMs fall apart with vaguely modern application complexity.
It may however help if someone identifies a small digestable chunk as security relevant and set it about the task of dealing withi t.
And complexity (Score:2)
> the selection of a 40 year old 6502 application is interesting,
Not even the application, just a 120 byte-long binary patch.
> It may however help if someone identifies a small digestable chunk as security relevant and set it about the task of dealing withi t.
And that chunk doesn't have any weirdness that requires a seasoned and actually human reverse-engineer.
(Think segmented memory model on anything pre "_64" of the x86 family - the kind of madness that can kill Ghidra).
Also, if it's not from the 8bit era or the very early 16bit era, chances are high that this bit of machine code didn't start as hand-written assembler but some higher-level compiled language (C most likely). It might be better to run G
Re: (Score:3)
There are two points to it:
1) It can find security issues in machine language
2) It even can do this for Apple II
I am always confused why people don't understand proof of concepts. If you get doom to run on your toaster, you are not looking for the best gaming platform, but are proving what you can do with the toaster hardware. If you find security bugs in Apple II binaries, you do not want to fix decades old software, but show that your tool understands decades old binaries. In practice you then apply your
Re: (Score:2)
I am always confused why people don't understand proof of concepts.
I'm not sure what about my comment made you think I don't understand "proof of concepts".
My comment was strictly about the phrasing "security issues" within a system that has no login prompt, and no concept of security to begin with. It's an absolute trash way to present the findings, framing it as "security issues". It's nonsense. The Apple II never had any security to begin with, none at all, zero...
Proof of Concept likes simplified cases (Score:1)
> There are two points to it: 1) It can find security issues in machine language 2) It even can do this for Apple II. I am always confused why people don't understand proof of concepts.
Excellent point, but this proof of concept works *because* it is Apple II. Extremely simple CPU and platform architectures. Proof of concept often uses simplified cases.
Re: (Score:1)
> Security issues on an Apple II? It's difficult to imagine what kind of "security" they think is possible on an Apple II.
Malware could upload your Apple Writer and VisiCalc files using a modern. OS, application, and data files being on the same disk. :-)
Re: (Score:2)
No amount of fixing bugs in any software, especially the one described in the article, is going to prevent any of that on an Apple II.
I just found their description of it as a "security issue" to be rather amusing.
Oh my god! (Score:5, Funny)
Why hasn't Apple released a security fix?! They've been sitting on this for DECADES!!!
Re: (Score:2)
Reminds me of an episode of Futurama where Fry runs into the room and breathlessly exclaims, "I got here as quickly as I could once I found out what happened a thousand years ago!"
Re: (Score:2)
Apple IIs are all highly secure, thanks to their built-in air-gap firewall!
Mustn't be very effective finding Windows vulns (Score:3)
If they are wasting time on Apple II instead of fixing the hot mess Windows 11 is.
Re: (Score:2)
No, it's busy making more of a mess of Windows 11.
Re: (Score:2)
I wonder if [1]Dr. Mark Russinovich [digiater.nl] himself would be interested if the AI could identify the differences between [2]Windows NT and Server [digiater.nl].
[1] https://www.digiater.nl/openvms/decus/vmslt97a/ntstuff/ntnodiff.html#idkern:~:text=was%20done%20by-,NT%20Internals%20expert,-Dr.%20Mark%20Russinovich
[2] https://www.digiater.nl/openvms/decus/vmslt97a/ntstuff/ntnodiff.html#idkern:~:text=NTS%20and%20NTW%3F-,Identical%20Kernels,-It%20turns%20out
Re: (Score:2)
just use copilot, i am sure it will be very helpful.
Re: (Score:2)
Is Windows 11 a hot mess due to security vulnerabilities , or is it a hot mess due to the enshittification of the platform for the benefit of CoPilot? Such as removing Wordpad, making Notepad a complex Wordpad, making Paint less versatile,....?
On the other hand, who put Claude up to finding bugs in that Apple II? Mark Russinovich?
Re: (Score:2)
Win11 is too complex as that AI could fix anything in there. It can make things worse though. "Code Review" AI already already starts failing in more complex teaching examples.
Hmmmmm. (Score:3)
Whereas, if he'd used the software engineering techniques that were well-known and well-described at that time, he'd not have included the bugs in the first place. Or, if he had, he'd have detected them in testing.
I do not find it reassuring that a chief technology officer is pleased that he wasn't clever enough to write or test code correctly. What I do find is that I fully understand how he can be a CTO in an organisation notorious for defective software and even more defective bugfix releases.
Re: (Score:2)
What a ridiculous take.
Russinovich is 59 years old this year (2026). 40 years ago he was 19 years old and in high school.
Are you really criticizing someone who wrote code with a bug (or really, incomplete error handling) as a teenager? That may be one of the most "terminally online" comments I've ever seen. Check out the guy's Wikipedia page. He's done some neat stuff.
Re: (Score:2)
Dude's got an impressive CV, no doubt. Using this to slam Microsoft is lame. He's written a ton of impressive code, literally using it as a CV to get a job at Microsoft (with SysInternals, nee WinInternals).
Re: (Score:2)
This was some little program a guy wrote at 20 years old that doesn't have any *real* reason to test for security (if you could run his code, you could just run whatever code you wanted anyway, it was a single user platform without any authentication or anything), and that should say anything one way or another about his capabilities as a 60 year old person?
Re: (Score:2)
You know, you usually have some really interesting things to say. I have you on my friend list so your comments get a +6 so I see what you have to say regardless of what people moderating think about your comments. I've been reading your journal entries for literally decades.
But, if I may paraphrase Bill Gates here, "this is the dumbest fucking thing I've read since I've been on Slashdot." You're suggesting that someone who hacked the BASIC interpreter on the Apple ][ forty years ago should have been usi
Re: (Score:2)
Indeed. Well said. The thing is a self-own, nothing else. Of course the AI fan idiots will not see it that way.
Re: (Score:2)
> I do not find it reassuring that a chief technology officer is pleased that he wasn't clever enough to write or test code correctly.
I was a shitty programmer once. So were you. So was every now-decent programmer, because being a shitty programmer and paying the price for making n00b mistakes is how one learns to becomes a good programmer. Nobody was born with the knowledge of how to apply all known best practices, and there's no shame in admitting it.
Not Copilot or OpenAI (Score:3)
Interesting he used Claude in this example. Very telling.
Re: (Score:2)
Not really. Claude is considered the best for coding and analysis. The others are not too bad either, but not quite as good as opus 4.6. So it's logical he'd use Claude for this personal experiment. If you think it's political you're adding that yourself. However it would be interesting to take the original apple ii buggy code and see if the other coding affects can find the same bugs.
Re: (Score:2)
Other coding agents. Still no AI contextual awareness in Google keyboard... Maybe that's a good thing.
Anyway if he was willing to post his original, unfixed code to GitHub I would be interested to run opencode on it with a number of different models.
Re: (Score:2)
I found his original code and I tried Opencode on it with OpenCode Zen Big Pickle, which is really a Chinese model called GLM. It did admirably. It disassembled the code and made some sort of sense out of it, but it definitely did not find the bugs.
On the other hand Claude Opus failed too for me. It claimed there was a bug that would prevent the example usage code given in the article from even working at all, which is clearly false. It did work. So it missed the bugs that Russinovich found with his Claud
Re: (Score:2)
So after Opus 4.6 gave up, I finally gave it the list of bugs that Russinovich has in his post. After it was pointed out to it my instance of Opus confirmed the DORESTORE missing line-not-found check bug and explained why it was a problem. However it disagreed with one of the other issues found by Russinovich's Opus instance. It said: "Token comparison logic bug รข" That other Opus instance was wrong here. The JMP $0314 goes to the CMP, not the LDA. The accumulator retains the token byte. It's corre
Re: (Score:2)
A lot of these stories include the final "look at the magical thing the LLM output" while conveniently skipping the "boring lead up" where they basically manually have to tell it what *not* to say before they get it to generate the thing they intended to. And if even that fails, they just skip writing the post.
Re: (Score:2)
I work for a Microsoft shop and use Copilot a lot. When I have a hard question, I use Claude Opus, otherwise ChatGPT is fine.
Re: (Score:2)
I think you can't beat Claude Opus at such tasks with other models currently. But that comes at a price. Literally.
CVSS (Score:4, Funny)
So what's the CVSS score on this vulnerability?
Example vs Practical (Score:3)
I knew everyone would come in here to bash this example just because it is an old platform not in general availability anymore.
But take a step back and realize what that means. Less documentation. Less availability. Less general knowledge on how the platform works overall.
These tools can handle it. And yes, these tools are already being used on modern hardware too.
Also most seem to be overlooking the "microcontroller" aspect of this: small microcontroller firmware files runs our world. It is now becoming trivially easy to fully reverse engineer proprietary firmware on these things. But more so, beyond that, these tools also are working considerably well now on X86 and ARM code for modern systems.
There is a certain level of "security via obscurity" in the close-sourced world, and that's now being blown wide open. This is it, this is the REAL story. But ya'lls are getting hung up on "OMG its an old Apple system"
Re: (Score:2)
> But take a step back and realize what that means. Less documentation. Less availability. Less general knowledge on how the platform works overall.
That's 100% backwards, except for the parts which are irrelevant, like availability. Being such an old processor there is more documentation, more commentary, it's very well known.
Re: (Score:2)
Think larger. What about binary obfuscation techniques? A LLM can read the machine language and collect facts without getting frustrated by all the reverse-engineering traps developers may have put in there and slowly gets to the core of what the thing does. Things may become quite interesting soon.
Re: (Score:2)
So I think the Azure CTO is getting his terminology mixed up.
Machine Code = already assembled code
Assembly = Human readable code.
I asked Copoilte if any AI can currently decompile a program and the answer is no. AI can't take raw bytes and disassemble the code on its own, this requires reasoning and AI only does pattern recognition.
Here is what probly happened.
CTO gave his assembly source to Claude and Claude saw the bug, there was no disassembly of this code, not by AI anyway.
Also the summery doesn't make
Re: (Score:2)
I was swayed by another comment in this discussion that points out that, for whatever reason, his example is an LLM analysis of a single routine manifested as 120 bytes of machine code. The choice to use something so utterly short is enough to perhaps re calibrate expectations for practical use. It did spot a couple of real issues but mostly buried the user in a list of "I know already" about how the general environment is not exactly credibly secure at all. 75% of the 'findings' were just "Hey, Apple II
Re: (Score:2)
> Less documentation. Less availability. Less general knowledge on how the platform works overall.
Yes, the little known 6502. What an [1]obscure device [wikipedia.org]. Only like five billion of them were produced, definitely not a lot of documentation or knowledge out there.
[1] https://en.wikipedia.org/wiki/MOS_Technology_6502
Dit it actually decompile it? (Score:2)
I mean, did someone actually "see" the AI decompiling the code?
Or try to recompile whatever the AI claims was the result of the decompilation?
Or are we just accepting this AI's word that it decompiled the code and then found errors?
Re: (Score:2)
These are exactly the kind of questions that came to my mind too.
Russinovich's post offers scarce details on how it was done. I would be interested if the "AI decompiled" code was compared to actual disassembler output to verify accuracy (or if the model used some external disassembler tool for it? Shrug.)
Re: (Score:2)
Does it matter if it "decompiles" or reads the machine language without intermediate step (even though one might suspect some kind of decompiled representation in the latents then)?
The point is the thing had the machine code (I guess some hex representation of it?) and understood it and found a bug.
Re: (Score:2)
When I asked Claude Opus to disassemble the code and add comments, it did that, yes. I'd post it here (with comments) but it would trip the lame lameness filter. Ironic that posting code to slashdot is considered "junk."
That's kind of interesting to consider that the model is large enough to encode an assembler and disassembler in its parameter matrix.
This is a good thing (Score:2)
AI companies have been approaching software in the wrong order
The correct order would have been to design, test and verify tools that could find bugs, edge cases and security vulnerabilities first, then once the bugfinder was mature, work on code generation.
Instead, what we got was "vibe coding" tools that allow the clueless to effortlessly create bloated, slow, inefficient, bug-ridden, insecure slop while the hypemongers proclaim "software engineering is dead"
firefox (Score:2)
Same story just broke about firefox.
Legitimate (Score:3)
Automating this kind of tedious work that nobody wants to do is one of the most legitimate hopes for a coding LLM. It's not replacing people because nobody would pay to do this anyways, even if you had an intern with nothing to do. The real test is whether it can do it on a more complex codebase with a modern language and not just inundate the user with false positives. We're not there yet.
There are bugs to be found in ALL 40-year-old code (Score:2)
People long ago have stopped looking for bugs in cold that old. Many bugs back then weren't considered severe enough to worry about. A null reference was just an inconvenience, not a security threat. I mean, whatever you (the end user) did to get that null reference, stop doing that!
It's hard to imagine any old software, or for that matter, any software, that would hold up to this scrutiny.
Who cares? (Score:2)
This is beyond ridiculous. That code is historic and irrelevant. Nobody cares to look at it. If that is the great proof of performance they have for their thing, I can only conclude it is a toy.
Re: (Score:2)
> If that is the great proof of performance they have for their thing, I can only conclude it is a toy.
No need to conclude anything; try it for yourself. Take your best code, the code that you've been debugging and polishing for years, the code that you've shipped in a hundred releases already, the code that you've run through every static analyzer and runtime test-harness you could get your hands on to try and ferret out any bugs, to the point where all of them returned "no further issues found". Dump that codebase into Claude Code (or whatever AI you think it appropriate) and ask it to scan the codebase
Fair enough, Apple II dev finding bug in AI Code (Score:1)
Fair enough, this former Apple II developer is finding bugs in AI generated code. :-)
How very relevant. (Score:2, Interesting)
This will be extremely useful for the once every couple years I bust out my old IIgs. Pity we can't run the AI on the Apple hardware to find these vulnerabilities in code we don't use at all in production anymore. These are critical issues that must be addressed!
Re: (Score:2)
Well the latter point may have more relevance, that a lot of embedded scenarios are like the Apple II scenario, never subjected to rigorous security review and largely banking on no one bothering to reverse engineer the closed source runtimes.
So this can shift the cost/benefit ratio to go look at some of those embedded applications and find ways to induce misbehavior. Depending on the scenario, the vendor is long gone or the design was never made to be field upgradeable. So you end up with known vulnerabil
Re: (Score:3)
There's a reason he cited Apple II as an example . You might be alarmed to learn how much legacy code - decades old - is still run on equally aged hardware in production . Perhaps not on Apple II, but a recent example I saw was a rack of PDP-9's still in control of machinery at an observatory.