Anthropic's AI Keeps Passing Its Own Company's Job Interview (anthropic.com)
- Reference: 0180642378
- News link: https://slashdot.org/story/26/01/23/0951257/anthropics-ai-keeps-passing-its-own-companys-job-interview
- Source link: https://www.anthropic.com/engineering/AI-resistant-technical-evaluations
Hume redesigned the test, making it harder. Then Claude Opus 4.5 matched even the best human scores within the two-hour time limit. For his third attempt, Hume abandoned realistic problems entirely and switched to abstract puzzles using a strange, minimal programming language -- something weird enough that Claude struggles with it. Anthropic is now releasing the original test as an open challenge. Beat Claude's best score and ... they want to hear from you.
[1] https://www.anthropic.com/engineering/AI-resistant-technical-evaluations
That's not really a surprise (Score:3)
The AI is going to quickly figure out the words needed to say to get through a job interview in the same way that a simple machine learning algorithm can learn how to beat world 1 1 of super Mario Bros. It tries the interview and fails and then it tries something else and fails and it does that a whole bunch of times until it succeeds and then it figures out what the interviewer is looking for.
This is more a sign of a weak interview process than anything else.
Right now the main thing these AI llms do is replace low-level customer service jobs. The problem with that is they are replacing a lot of those jobs and those people don't just eat a bullet when they become unemployable.
Some of them get stuck in fast food or go be plumbers or whatever but a lot of them go back and get starts or study or get degrees or take the degrees they already have and push themselves harder and start competing for your higher paying job.
There are millions of fake job posts out there that exist only to gauge the state of the job market so your boss knows whether or not he can cut your pay or fire you and replace you with somebody younger. Your boss doesn't care about your qualifications or your knowledge he doesn't even bother taking the time to learn those things. You are a number on a spreadsheet
Re: (Score:2)
In theory you're right, but in practice I doubt that they gave the AI a thousand tries. And your optimization approach would need a few orders of magnitude more. If you have an AI that can do self play it will find all kinds of exploits as shortcuts, but it also needs like a million tries for that.
Knuth has a real chance at this one! (Score:1)
Reading through some of Don Knuth's stuff, I'd say he has a pretty good chance of beating the pants of Claude in this one.
this means absolutely nothing (Score:5, Insightful)
Anyone familiar with AI is painfully aware that AI models are ONLY good at what they're trained to do. If you train it to pass your interview, then of course it's going to be very good at that. But as soon as you take a step or two away from what it's been training on, it will be anywhere from bad to horrible. And what's worse, they often have an absurdly high level of confidence in their wrong answers when you go off training.
man vs. car (Score:3)
Please, demonstrate your hireability by running 10 miles as fast as you can. What, you just got beaten out by a rusty 1998 ford fiesta? What's wrong with you? Clearly, you're not right to work at our firm
All this means is the Anthropic is in a near-hiring-freeze situation
. [1]https://www.youtube.com/watch?... [youtube.com]
[1] https://www.youtube.com/watch?v=NcZem2OmDBk
Sounds like they don't understand the point (Score:2)
The point of hiring is not to hire people who know the answers to riddles under a time limit. The point is to hire people who can get up to speed on the job reasonably quickly, work well in concert with their co-workers, and then grow the position and product and company going forward. Honestly, the best candidates for most tech jobs won't be bothered to optimize for your particular interview - I'm not saying they'll outright fail, but rather they have many opportunities, so for them the interview is a mu
Amazing (Score:4, Funny)
Their in house system is good at solving problems they write! INCONCEIVABLE!
Re: (Score:3)
We are supposed to read this and think "wow, if it can do that, then surely it can replace several of these expensive programmers that I have on my staff!"
And yet....it can't....