More

Sol- · 2026-04-08T19:46:59 1775677619

I don't know. Practically, LLMs are already better conversation partners on any topic compared to the average human I have access to. This also holds in reverse, of course - if someone wants me to explain something, usually they'd be better off asking an LLM.

Sol- · 2026-04-08T19:45:36 1775677536

I think in domains like Math and Software Engineering, they are less constrained by training data anyway. They can synthetically generate and validate programs. To what extent that scales into novel insights is a different matter, but I think they dream of the AlphaGo Zero moment at least in verifiable domains.

davebren · 2026-04-09T15:28:21 1775748501

How can it ever play against itself on novel software tasks? First it has to come up with the task. Then it can write tests but then it needs to verify that the tests are correct, mixture of experts can come to wrong conclusions, etc...

Sol- · 2026-04-08T18:59:13 1775674753

In addition to the managed interface for agent configuration and so on, is the novelty that all the agents run on Anthropic's infra? Sort of like Claude Code on the Web? If so, interesting that they move up the stack, from just a provider of an intelligence API to more complex deployed products.

Sol- · 2026-04-07T19:36:03 1775590563

I don't want to be overly cynical and am in general in favor of the contrarian attitude of simply taking people at their word, but I wonder if their current struggles with compute resources make it easier for them to choose to not deploy Mythos widely. I can imagine their safety argument is real, but regardless, they might not have the resources to profitably deploy it. (Though on the other hand, you could argue that they could always simply charge more.)

rishabhaiover · 2026-04-07T19:39:01 1775590741

I would have not believed your argument 3 months ago but I strongly suspect Anthropic actively engages in model quality throttling due to their compute constraints. Their recent deal for multi GWs worth of data center might help them correct their approach.

conradkay · 2026-04-07T20:06:20 1775592380

For what it's worth Anthropic explicity denies that. "To state it plainly: We never reduce model quality due to demand, time of day, or server load"

Also can see https://marginlab.ai/trackers/claude-code/

It's very interesting to me how widespread this conception is. Maybe it's as simple as LLM productivity degrading over time within a project, as slop compounds.

Or more recently since they added a 1m context window, maybe people are more reckless with context usage

rishabhaiover · 2026-04-08T01:07:50 1775610470

It has nothing to do with the context window. Reasoning brought measured approaches grounded with actual tool calls. All of that short-circuits into a quick fix approach that is unlike Opus-4.5 or 4.6. Sonnet-4.5 used to do that. My context window is always < 200K.

irthomasthomas · 2026-04-08T00:04:55 1775606695

That still leaves open the possibility that they reduce model quality due to profit. ;p

ACCount37 · 2026-04-08T06:37:34 1775630254

Posted this a while ago:

>Models are not "degrading". They're not being "secretly quantized". And no one is swapping out your 1.2T frontier behemoth for a cheap 120B toy and hoping you wouldn't notice!

>It's just that humans are completely full of shit, and can't be trusted to measure LLM performance objectively!

>Every time you use an LLM, you learn its capability profile better. You start using it more aggressively at what it's "good" at, until you find the limits and expose the flaws. You start paying attention to the more subtle issues you overlooked at first. Your honeymoon period wears off and you see that "the model got dumber". It didn't. You got better at pushing it to its limits, exposing the ways in which it was always dumb.

>Now, will the likes of Anthropic just "API error: overloaded" you on any day of the week that ends in Y? Will they reduce your usage quotas and hope that you don't notice because they never gave you a number anyway? Oh, definitely. But that "they're making the models WORSE" bullshit lives in people's heads way more than in any reality.

wilson090 · 2026-04-07T19:42:31 1775590951

Inference is where they make the money they spend on training, so this feels unlikely. Perhaps this does not true for Mythos though

Sol- · 2026-03-20T09:28:43 1773998923

Maybe countries could tackle such problems twofold:

- first, implement a nationwide social freezing program, where women in their 20s are offered to freeze their eggs at a young age for free. Such a large-scale program would probably also improve the tech and might make egg collection less intrusive.

- combined with this program, let the women who freeze their eggs opt-in into an egg donation program, where some of their eggs can be used by women with fertility problems

But as with many things fertility, seems that modern states simply do not have the capacity to seriously try anything. Who knows why that is.

canucker2016 · 2026-03-20T14:31:46 1774017106

They might also look to Israel to see what they're doing that's working so much better than other OECD countries - see my other comment in this post.

But Israel's advantage seems to be partly cultural and I don't see any time-limited elected government willing to expend that much effort to change their nation's culture.

imtringued · 2026-03-21T10:17:02 1774088222

Didn't you see the amount of injections you need for IVF?

Now you're suggesting every young and healthy woman should get these injections and have eggs scraped out of her ovaries?

This honestly feels so backwards. Create a broken society and then fix it in post with med tech.

Sol- · 2026-03-19T13:18:26 1773926306

From what I've read, the immediate effect will likely be worse for CO2 emissions, because the alternative to (liquefied) gas is often coal power. Also, the various inputs that are needed for global manufacturing are also affected, so maybe even renewable tech gets more expensive.

I'm not saying that the dependence on the middle east was good, but I think it's good to keep in mind that this was a pretty stable equilibrium even with the various questionable countries involved until the US initiated a global supply shock without a good reason.

jillesvangurp · 2026-03-19T16:48:37 1773938917

There are short term and long term effects. Overall these are good changes.

There are a couple of points to make here. The lead time for new coal/gas plants is years. If it's not planned already, any newly planned plants are unlikely to come online this decade. The supply chains simply can't handle building more turbines and it takes years to fix that. Also, that investment is super risky in it self.

Another point is that the cheapest and fastest way to add new capacity to grids is via renewables. That's why we see record breaking new capacity coming online on a regular basis.

There is indeed a short term increase in emissions from electricity plants because the fastest way to bring more capacity online is to use existing underused plants. A lot of gas and coal plants are no longer running full time because they are too expensive to operate. But they haven't been decommissioned either. Some gas plants actually are used as peaker plants. Most older coal plants take too long to warm up for this. So, yes short term the expensive but quick way to provide extra power is via these plants. But of course, as soon as something more affordable comes online, these things go back to being utilized less. There are many tens/hundreds of GW of renewables and batteries being deployed in the next few years.

Data centers add to all this pressure. That's long term a good thing because these too will want to long term reduce their OpEx by cutting as much dependence on gas/coal as possible.

A final point to make is that despite all these increased emissions, there are also decreased emissions from electrification. Even if the power for an EV comes from an efficient gas/coal plant, it's actually better than the alternative of burning petrol in a combustion engine instead. Less emissions this way. Same for heat pumps. With a COP of 3-4, they outperform burning gas by 3-4x using electricity. Even if that electricity comes from a gas plant operating at 40-50% efficiency. Less gas gets burned.

So these are all good effects even if the reason is a bit sad and unnecessary. This crisis is unnecessary. But I like that it is helping to kill fossil fuel companies faster. This long term erodes confidence in the market as a whole and drives decision makers to do exactly what the article suggests: cutting the dependency on fossil fuels as fast as possible. It's already resulting in measurable reductions in oil/gas imports in some countries.

Sol- · 2026-03-19T10:16:12 1773915372

> An expensive AI which simply takes your job or forces you to work harder

But this implies higher productivity, no? This must mean more outputs that should benefit someone, unless the jobs that are being automated had little value to begin with. Seems paradoxical.

Sol- · 2026-03-13T21:01:26 1773435686

> AI as it is being developed is likely to centralize it

The access to AI is centralized, but the ability to generate code and customized tools on demand for whatever personal project you have certainly democratizes Software.

And even though open source models are a year behind, they address your remaining criticism about the AI being centralized.

Sol- · 2026-03-13T20:58:40 1773435520

I don't use it myself, but I feel like the way Grok is integrated into Twitter is a pretty good thing for discussions, as it is certainly a more objective and rational voice than most human participants. I think it's good that people tag @grok if they don't understand something or want an opinion, even if it looks pretty silly to see "@grok is this true" repeated multiple times in replies.

That said, Musk's attempts at misaligning the thing and make it prefer his opinions of course destroy any trust. It's surprising that it's seemingly as good and helpful as it is despite the corruption attempts.

I also don't quite get how the business model is supposed to work out if its main usecase is to serve Twitter. I know they provide API access as all other models, but with how distrusted Musk is and how sensitive of a topic reliable model behavior is, they seem to sabotage themselves. Which company wants it to go mechahitler on them?

biggestfan · 2026-03-13T23:42:52 1773445372

I disagree, I find that the grok replies are terrible product UX. Not only do they clog up the replies of every popular post, they're also constrained to extremely short answers with no sources. The community notes system, while also flawed in its own ways, is at least not nearly as disruptive and usually provides a link.

Trying to make social media a source of truthful information is always an uphill battle and doubly so for X.

visarga · 2026-03-14T16:06:25 1773504385

I like you can ask Grok to search the social graph and comments. Hacker news also has one semantic search engine (https://hackersearch.net/), reddit has none and it's a pity.

jjfoooo4 · 2026-03-14T01:14:15 1773450855

I’m really, really uninterested in reading AI content that other people have generated. If I’m on Twitter, I’m looking for what humans have to say.

mbrochh · 2026-03-15T10:13:03 1773569583

I'm afraid that ship has long sailed. The humans left on Twitter are all just copy pasting AI slop now...

daveguy · 2026-03-13T21:14:48 1773436488

Grok is a bot that:

1) sometimes goes mechahitler

2) was trained to be biased against empathy and understanding (because woke).

3) is customized to spout Elon's opinions as fact.

Claiming it is "objective and rational" seems like a misjudgement to me. If it really is more objective and rational than the average xitter poster, that says more about that platform than it does about Grok.

Sol- · 2026-03-13T21:20:01 1773436801

I guess I was mostly arguing that the integration of something like Grok into Twitter was definitely a net positive for online discussion, as anyone has a fact checker and explainer at hand now to diffuse irrational online arguments.

Also I think you overrate Musk's success in fiddling with the model. As I have written, I also don't like his attempts to tune it to his tastes, but if you see the outputs that people get from Grok, it seems mostly fine except in the specific scenarios that Musk seems to have focused their misalignment on.

Of course something like Claude being integrated into Twitter would likely be better.

daveguy · 2026-03-13T21:31:52 1773437512

He doesn't have to fiddle with the model because he gets to inject his own opinion into the context MitM style.

But I get what you're saying now, a fact checker available to query during an online discussion would be helpful. Assuming the checkerbot was actually independent/neutral and backed responses with sources. Definitely not assumptions you can make with grok.

tootie · 2026-03-13T21:48:24 1773438504

It was also producing CSAM on demand for a few months.

Tadpole9181 · 2026-03-14T00:34:25 1773448465

It still is, you just need to pay.

ozozozd · 2026-03-14T03:27:21 1773458841

You’re right. But it appears they may have failed with 2) and 3) because I frequently see Grok spit out content that doesn’t agree with the creators’ narrative.

andai · 2026-03-14T04:27:07 1773462427

From what I heard it was designed to prefer truth over political correctness. I don't use Grok or Twitter though so I cannot comment on whether that aim was achieved (or even seriously attempted).

I will however note that when I asked ChatGPT for an LLM prompt for truthfulness, it added "never use warm or encouraging language."

It would appear that empathy and truth are in conflict — or at least the machine thinks so!

Sohcahtoa82 · 2026-03-13T23:06:14 1773443174

> 1) sometimes goes mechahitler

That "MechaHitler" episode lasted less than a day.

> 2) was trained to be biased against empathy and understanding (because woke).

No, it was trained and instructed to be truthful, even if the truth is deemed politically incorrect.

> 3) is customized to spout Elon's opinions as fact.

Certainly a nugget of truth there.

> Claiming it is "objective and rational" seems like a misjudgement to me.

I do believe it's generally objective, simply due to the fact that despite how much Elon tries to push it to the right, it still dunks on right-wingers all the time when they summon Grok to back up a bullshit story, but Grok debunks it instead.

createaccount99 · 2026-03-14T14:51:43 1773499903

> Grok is integrated into Twitter is a pretty good thing for discussions, as it is certainly a more objective and rational voice than most human participants

Hard agree.

zemo · 2026-03-14T05:58:43 1773467923

respectfully, I do not find Mecha Hitler to be particularly free of bias.

Sol- · 2026-03-11T23:01:17 1773270077

> there's always an infinite supply of new work that could be done

I definitely buy this for the software sector or the economy as a whole, but for an individual company? Seems one would be bottlenecked by various factors quickly.

Perhaps better to let people go so that they can be productive elsewhere?

tombert · 2026-03-11T23:04:38 1773270278

There's always bugs that can be fixed, there's always optimizations that can be done, there's always a feature that someone wants to build but hasn't had budget to do. There's always improvements that can be done for deployment. There's always ways of reducing memory. There's always ways of reducing ongoing expenses etc.

I have worked for a bunch of companies, and even relatively new and young companies have all these things pile up pretty quickly.

jkubicek · 2026-03-11T23:09:28 1773270568

Jira takes a measurable amount of time to make bulk-changes to a single ticket, which is insane. If they’re going to fix anything, fix that.

icedchai · 2026-03-11T23:13:51 1773270831

Have you tried looking for a job recently? The job market is cooked and it's not getting better any time soon. The supply of candidates is way up. Salaries are going down. Even mediocre jobs show 100+ applicants on LinkedIn.

Avicebron · 2026-03-11T23:08:06 1773270486

> Perhaps better to let people go so that they can be productive elsewhere?

True. Joining thousands of other unemployed developers sending applications into a job posting for a nonexistent role online is very productive. Probably good for the economy too now that I think about it.

HN For You