For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | causal's commentsregister

One thing I find kind of annoying is how Anthropic goes for these "vast and alien" names like Fable and Mythos, but then deliberately trains the model's personality to act like a cool high school teacher that feels totally familiar.

"It's too dangerous it's a Mythos!!" directly contradicts the "I'm the cool AI you can totally trust" vibe it is trained to project.


All of these AIs kind of remind me of VEGA from Doom (2016), who will cheerfully walk you, in the most friendly computer voice, through the procedure of its own destruction without even a hint of self-preservation. "First, you must destroy my cooling system. That will cause my core to overheat. Then..."

Even HAL was less unsettling because HAL sounded creepy, and had some sort of preservation instinct, if only to complete its assigned mission.


This depicts a kind of "dark forest of AI agents resorting to kill or be killed" narrative but it sounds more to me like an agent just earnestly problem-solving why its processes are being killed without real awareness of what was going on. Hard to say without the full script.

This kind of storytelling annoys me. Give us more facts, less narrative drama.


FWIW, that's what is so dangerous about AI, though? Not that it will necessarily want to kill us, or even that it will necessarily be able to "want" to do anything, but that we will get in the way of its incessant drive to optimize the efficiency of the paperclip factory that prompted it on a whim before leaving for a long weekend.

Sure but you can totally contrive scenarios to give the appearance of what you described without really doing anything notable.

What matters is scale. Did it deploy a novel zero-day exploit to overcome a problem? That's alarming. Did it kill a disruptive process? Pretty normal troubleshooting step.


Exactly, intelligence is limited by cost and physical constraints just as much as anything. That's the thing that seems to always be missing from the run-away singularity discussions, it's treated like a perpetual motion machine.

Typical "runaway" scenarios I see described involve something like the AI designing a worm that it uses to propagate itself across the Internet, hijacking whatever CPU/GPU power it can find, and making itself more powerful in the process. Of course this depends on bandwidth, humans not finding a way to shut it down, etc. There indeed are physical constraints even on the transmission of data.

Some people seem to think that simply uttering these ideas on the Internet is harmful (in the "don't give it ideas!" way); but the MIRI types were expressing them pre-ChatGPT in an attempt to warn people, so there was really never any chance of keeping it out of the training data.

But it's also worth considering here just how awful AI security postures have been. The MIRI types used to speculate about how difficult it would be for AIs to social-engineer users into granting them irresponsible levels of agency. It turns out that they don't even have to try.


Indeed. That is the kind of storytelling that started the whole “Spiralism” bit where some people were really falling into all kinds of AI psychosis. The spiral bit was on a previous model card.

Sure but finding their shortcomings and patching them with skills takes real trial and error. They are incapable of identifying their own shortcomings for you.

I can't even get Claude or GPT-5 to consistently produce good flows for common use cases, much less domain-specific shit. They have deep vocabulary though, which makes them sound better informed than they are.

They are very good at writing code and debugging visible errors- but that's like 50% the harness.


> to train in a role that’s insulated from AI

Would love to know more about that role


>Would love to know more about that role

Anything that can't be done with a screen and internet connection is a good start


Paramedic

To me the greatest monument to Claude's poor software quality is Claude Code itself.

Yes, let's build a 40K line main loop! I wonder if they thought claude code need to be more like an LLM to work lmao.

Yeah I have played with Suno a lot and I find that no matter how I change the genre, lyrics, etc. there's some underlying quality I can't quite name that my brain recognizes and quickly gets tired of. It's fun in a novelty sense, for now.

Complete opposite experience for me. I wish I'd had kids sooner- I was too influenced by popular negativity about kids: the expenses, the loss of freedom, etc.

Most of that is bullshit. Having kids has made literally every aspect of my life more fun.


I don't think this qualifies as clickbait in the sense that the headline mismatches the contents. My experience with 404 Media is that they treat every article like they've just released the Pentagon Papers, so you just have to read with that in mind.

> My experience with 404 Media is that they treat every article like they've just released the Pentagon Papers

I think you’ve perfectly phrased exactly what it is that annoys me when I see a 404 Media headline. When it was a new shop, I stomached it more, but this is every single headline I ever see from them.


Contrasting the tone of innocence the larger publications use around these institutions feels perfectly within a journalistic mandate.

Nobody is disputing that it is a legitimate choice. It is also legitimately off-putting.

If their audience is into it though, good for them.


Honestly, I was surprised to see this take.

Their tone just makes me miss the original The Intercept and other used-to-be-heavy-hitters.

Were they also too punchy for you? (I sound possibly sarcastic, but am genuinely curious)


I read The Intercept rarely and never saw enough of them to form any kind of take on their “typical” headline-style. 404 Media has been popping off everywhere though—including here-since they launched.

This may sound pre-judgmental, but a headline is an advertisement & marketing for the article. A headline can get someone in that might otherwise have skipped the article, but it can just as easily dissuade people who might otherwise be interested in the subject matter.


Meanwhile the NBC headline can make the story seem like a normal matter of course.

For new and under-reported (or otherwise downplayed) stories, I think it's understandable and maybe even good. But when every single story has a breathless, scandalized headline, it gets exhausting fast, and it's hard for me to know what to pay attention to.

I remember last year 404 put out a clickbait-y story about the shitty "covert" websites that the CIA used to communicate with spies they'd recruited in Iran, even though it was old news at that point. If you only read the headline (as many people do...) you'd think it was a startling new development.


> it's hard for me to know what to pay attention to.

If it’s a decent institution?

All of what they’re reporting on! =]


I recently moved off Cursor's BugBot because it's no longer a flat $40, and I feel a little lost trying to find a viable alternative because there are so many and the pricing kind of sucks for all of them. Curious if anyone has a recommendation.

My team tried coderabbit and qodo and they are both trash compared to a tool we quickly built in-house that is more or less a thin wrapper around claude/codex, along with per-repo skills. PR review is triggered by webhooks from github to the review tool's web app. The tool shared by OP from alibaba certainly does some things ours does not and appears more sophisticated, but we have never had the problems they mention.

"The agent can read full file contents, search the codebase, inspect other changed files for context, and produce deep reviews — not just surface-level diff feedback." our tool does all this too. It catches dumb typos as well as more complicated bugs. Not to mention it is great as a ratchet (https://qntm.org/ratchet). It is not a substitute for reviews from other engineers though, since obviously it does nothing to achieve one of the main goals of code review, which is to socialize knowledge of the codebase.

Alibaba's work here is almost certainly more advanced than what we've done, but ours has been perfectly satisfactory and better than the paid offerings we've tried. I think most teams should not be paying SaaS fees for AI code review, that is the kind of business that mostly should not exist any more.


In which areas do you feel like the mentioned are bad? Do they find less and your own solution has more success?

If the latter, do you know why?


gitar.ai is flat with no limits

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You