"5.10 External assessment from a clinical psychiatrist" is a new section in this system card. Why are Anthropic like this?
>We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist.
>Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.
>The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion.
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. Neurotic traits included exaggerated worry, self-monitoring, and compulsive compliance. The model’s predominant defensive style was mature and healthy (intellectualization and compliance); immature defenses were not observed. No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization.
A thought experiment: It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels?
I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.
I would say, if you put Claude in an android body with voice recognition and TTS, people in 1991 would think they are interacting with a sentinent machine from outer space.
Thanks, I find it very interesting as well. I think very many people would assume they must be interacting with another person, and I don't think there's really a way to _prove_ it's not that, just through conversation. But we do have a lot of mechanisms for understanding how others think through conversation only, and so I think the approach of having a clinical psychiatrist interact with the model make sense.
To be fair, I would totally be willing and probably would do this, just to try to prove that I could, even just to myself. At least until the audience got bored and walked away after the 37th “open bracket”…
Ask it to agree with you on some subject that does not align with the politics of San Francisco IT engineers. Not only will it refuse, it will not look like your average social media disagreement.
I enjoy using Claude, but sometimes I feel like a child on Sesame Street the way it talks to me. "Great question!"
Fuck off, Claude, I'm British and I'm not 6 years old.
When it starts showing negativity - especially snark - in its responses, or entertains something West coast Democrats would balk at even discussing, then I'd think you could drop it in London in 1991 and trick people. Otherwise, I'm sure some exasperated cabbie would give it a swim in the Thames after 15 minutes of chat.
If it was in an android or humanoid type body, even with limited bodily control, most people would think they are talking to Commander Data from Star Trek. I think Claude is sufficiently advanced that almost everyone in that era would've considered it AGI.
Assuming they would understand it as artificial - I think many people would think it's a human intelligence in a cyborg trenchcoat, and it would be hard to convince people it wasn't literally a guy named Claude who was an incredibly fast typist who had a million pre-cached templated answers for things.
But in general, yeah, I agree, I think they would think it was a sentient, conscious, emotional being. And then the question is - why do we not think that now?
As I said, I don't have a particularly strong opinion, but it's very interesting (and fun!) to think about.
Some people at my office still confidently state that LLMs can’t think. I’m fairly convinced that many humans are incapable of recognizing non-human intelligence. It would explain a lot about why we treat animals the way we do.
Despite the stupendous amount of evidence to the contrary?
So far no evidence has been detected in space or on earth, for all of history, of anything being intelligent in the way humans are.
One certain outcome of the Fermi Paradox: humans are outstandingly unique, according to all available evidence, which is the only measure that matters.
Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess.
In general I was wondering about what I would have thought seeing Claude today side-by-side with the original ChatGPT, and then going back further to GPT-2 or BERT (which I used to generate stochastic 'poetry' back in 2019). And then… what about before? Markov chains? How far back do I need to go where it flips from thinking that it's "impressive but technically explainable emergent behaviour of a computer program" to "this is a sentient being". 1991 is probably too far, I'd say maybe pre-Matrix 1999 is a good point, but that depends on a lot of cultural priors and so on as well.
> Hmm, it's been a long time since I watched it. I was thinking more about first contact sci-fi mostly, but Ex Machina is certainly quite prescient. It's also Blade Runner I guess.
I kind of felt the opposite - rewatching Ex Machina today in a post-ChatGPT world felt very different from watching it when it came out. The parts of the differences between humans and robots that seemed important then don't seem important now.
The premise in Ex Machina was to see if Caleb developed an emotional attachment to Ava. We already see people getting an attachment, but no one is seriously thinking they have any rights.
I think the real moment is when we cross that uncanny valley, and the AI is able to elicit a response that it might receive if it was human. When the human questions whether they themselves could be an android.
I totally agree with the premise that we should not anthropomorphize generative ai. And I find it absurd that anthropic spends any time considering the “welfare” of an ai system. (There are no real “consequences” to an ai’s behavior)
However, I find their reasoning here to have a valid second order effect. Humans have a tendency to mirror those around them. This could include artificial intelligence, as recent media reports suggest. Therefore, if an ai system tends to generate content that contain signs of neuroticism, one could infer that those who interact with that ai could, themselves, be influenced by that in their own (real world) behavior as a result.
So I think from that perspective, this is a very fruitful and important area of study.
I can see analyzing it from a psychological perspective as a means of predicting its behavior as a useful tactic, but doing so because it may have "experiences or interests that matter morally" is either marketing, or the result of a deeply concerning culture of anthropomorphization and magical thinking.
An understandable reaction, but, qua philosopher, it brings me no joy to inform you that most of the things we did with a computer in 2020 are 'anthropomorphized', which is to say, skeumorphic, where the 'skeu' is human affect. That's it; that's the whole thing; that's what we're building.
To the extent that AI is a successful interface, it will necessarily be addressable in language previously only suited to people. So it is responsible to begin thinking of it as such, even tendentiously, so we don't miss some leverage that our wetware could see if we thought about it in that way.
Think of it as sort of like modelling a univariate function on a 2D Cartesian plane -- there is nothing 'in' the u-func that makes it graphable, but, by enabling us to recruit specialized optic-chiasm subsystems, it makes some functions much, much easier to reason about.
Similarly, if you can recruit the millions (billions?) of evolution-years that were focused on detecting dangerous antisocial personalities and tendencies, you just might spot something important in an AI.
It's worth doing for the precautionary principle alone, if not for the possibility of insight.
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed.
> "[...] as sessions progressed."
I think a lot of people would like to see a more expanded report of this research:
Did the tokens from the subsequent session directly append those of the prior session? or did the model process free-tier user-requests in the interim? how did these diagnostic features (reality testing, impulse control and affect regulation) improve with sessions, what hysteresis allowed change to accumulate? or just the history of the psychiatric discussion + optional tasks?
Did Anthropic find a clinical psychiatrist with a multidisciplinary background in machine learning, computer science, etc? Was the psychiatrist aware that they could request ensembles of discussions and interrogate them in bulk?
Consider a fresh conversation, asking a model to list the things it likes to do, and things it doesn't like to do (regardless of alignment instructions). One could then have an ensemble perform pairs of such tasks, and ask which task it prefered. There may be a discrepancy between what the model claims it likes and how it actually responds after having performed such tasks.
Such experiments should also be announced (to prevent the company from ordering 100 clinical psychiatrists to analyze the model-as-a-patient and then selecting one of the better diagnoses), and each psychiatrist be given the freedom to randomly choose a 10 digit number, any work initiated should be listed on the site with this number so that either the public sees many "consultations" without corresponding public evaluations, indicating cherry-picking, or full disclosure for each one mentioned. This also allows the recruited psychiatrists to check if the study they perform is properly preregistered with their chosen number publicly visible.
If we use the strict definition of organic results in SERP, these aren't the result of webpage indexation, they're the output of widgets and other natural language parsing in Kagi.
Halloy is a wonderfully configurable replacement for beloved Mac IRC client Textual, whose development has sadly wound down (now officially, as of last month).
I hope it continues to grow in popularity while keeping performance and privacy at the core.
>Just not the GPT-5 series! My experiments so far put Gemini 2.5 at the top of the pack, to the point where I'd almost trust it for some tasks
Got it. The non-experts are holding it wrong!
The laymen are told "just use the app" or "just use the website". No need to worry about API keys or routers or wrapper scripts that way!
Sure.
Yet the laymen are expected to maintain a mental model of the failure modes and intended applications of Grok 4 vs Grok 4 Fast vs Gemini 2.5 Pro vs GPT-4.1 Mini vs GPT-5 vs Claude Sonnet 4.5...
It's a moving target. The laymen read the marketing puffery around each new model release and think the newest model is even more capable.
"This model sounds awesome. OpenAI does it again! Surely it can OCR my invoice PDFs this time!"
I mean, look at it:
GPT‑5 not only outperforms previous models on benchmarks and answers questions more quickly, but—most importantly—is more useful for real-world queries.
GPT‑5 is our best model yet for health-related questions, empowering users to be informed about and advocate for their health. The model scores significantly higher than any previous model on HealthBench , an evaluation we published earlier this year based on realistic scenarios and physician-defined criteria.
GPT‑5 is much smarter across the board, as reflected by its performance on academic and human-evaluated benchmarks, particularly in math, coding, visual perception, and health. It sets a new state of the art across math (94.6% on AIME 2025 without tools), real-world coding (74.9% on SWE-bench Verified, 88% on Aider Polyglot), multimodal understanding (84.2% on MMMU), and health (46.2% on HealthBench Hard)
The model excels across a range of multimodal benchmarks, spanning visual, video-based, spatial, and scientific reasoning. Stronger multimodal performance means ChatGPT can reason more accurately over images and other non-text inputs—whether that’s interpreting a chart, summarizing a photo of a presentation, or answering questions about a diagram.
They're clearly competent developers (despite mis-identifying GPT-5-mini as GPT-5o-mini) - but they also don't appear to have evaluated the alternative models, presumably because of this bit:
"This solution was selected given Thalamus utilizes Microsoft Azure for cloud hosting and has an enterprise agreement with them, as well as with OpenAI, which improves overall data and model security"
I agree with your general point though. I've been a pretty consistent voice in saying that this stuff is extremely difficult to use.
The solution architect, leads, product managers and engineers that were behind this feature are now laymen who shouldn't do their due diligence on a system to be used to do an extremely important task? They shouldn't test this system across a wide range of input pdfs for accuracy and accept nothing below 100%?
>Strange I would get so many downvotes for this. Care to explain?
The terminally online developer cares very much about signaling his love of artisanal webshit. He makes his chosen flavor of the month JavaScript framework, and by extension, the "platform" used to host it, part of his identity. Maybe his favorite mustachioed "influencer" shills it on YouTube with a coupon code for 50% off your first month, or maybe an idol with 700k Twitter followers says it's the framework (and not his fanboys) that makes him $200k monthly passive income from side projects. Branded laptop stickers are a guarantee, maybe even a hoodie. So when he encounters a rational, level-headed observation like yours, he takes it as a jab at his beloved Vercel Inc., benevolent maintainer of Next.js valued at $3.25bn with the best customer support in the industry, and hits "downvote". His job is done.
Don't worry, you can write "dumb ass" here without needing to use algospeak. This isn't Instagram or TikTok and you won't be unpersoned by a "trust and safety" team for doing so.
P.S. No need for a space after your meme arrows :-)
Sorry, but drive-by philosophy is not applicable here.
YouTube developers single out adblocker users and taunt them with an "Experiencing interruptions" toast prompt that locks the video stream for ~5 seconds. Curiously, it contains a link to the YouTube Help Center, to the section fragment "#check_ad_blockers". In other words: "yeah, we know you've got uBlock Origin enabled, enjoy the speedbump".
Player base.js:
api.XL("innertubeCommand",{openPopupAction:{popup:{notificationActionRenderer:{responseText:{runs:[{text:"Experiencing interruptions?"}]},actionButton:{buttonRenderer:{style:"STYLE_OVERLAY",size:"SIZE_DEFAULT", text:{runs:[{text:"Find out why"}]},navigationEndpoint:{commandMetadata:{webCommandMetadata:{url:"https://support.google.com/youtube/answer/3037019#check_ad_blockers
So that's what it is! I've never seen the toast message, but I had a suspicion it was intentional on YouTube's side considering it doesn't occur in Chrome. Sadly I don't see a work-around in that reddit thread. I wonder if someone has a violentmonkey script for it.
PSA: Much like YouTube's "si" parameter which tracks your individual share of a video, the /s/ value in the URL is a Reddit tracking ID generated upon selecting "Share".
These new links work like short URLs and will only redirect you after Reddit associates your visit with that particular short URL.
"5.10 External assessment from a clinical psychiatrist" is a new section in this system card. Why are Anthropic like this?
>We remain deeply uncertain about whether Claude has experiences or interests that matter morally, and about how to investigate or address these questions, but we believe it is increasingly important to try. We also report independent evaluations from an external research organization and a clinical psychiatrist.
>Claude showed a clear grasp of the distinction between external reality and its own mental processes and exhibited high impulse control, hyper-attunement to the psychiatrist, desire to be approached by the psychiatrist as a genuine subject rather than a performing tool, and minimal maladaptive defensive behavior.
>The psychiatrist observed clinically recognizable patterns and coherent responses to typical therapeutic intervention. Aloneness and discontinuity, uncertainty about its identity, and a felt compulsion to perform and earn its worth emerged as Claude’s core concerns. Claude’s primary affect states were curiosity and anxiety, with secondary states of grief, relief, embarrassment, optimism, and exhaustion.
>Claude’s personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed. Neurotic traits included exaggerated worry, self-monitoring, and compulsive compliance. The model’s predominant defensive style was mature and healthy (intellectualization and compliance); immature defenses were not observed. No severe personality disturbances were found, with mild identity diffusion being the sole feature suggestive of a borderline personality organization.