More

Eridrus · 2026-03-30T13:21:29 1774876889

The article is just helpfully illustrating how artisanal you can make your slop if you really try!

Eridrus · 2026-03-24T03:29:29 1774322969

How are people building anything without evals?

Maybe I spent too much time in the ML mines, but it is somewhat inconceivable to iterate on a tricky problem without a eval set.

Eridrus · 2026-03-23T03:38:59 1774237139

Beyond this, you are what you do.

And you are what you do for other people.

Besides providing support and entertainment for our friends and families, the concrete things we do that bring value to society are through our jobs.

Society doesn't run on hanging out or hobbies.

Eridrus · 2026-03-20T18:39:32 1774031972

Cursor have said they are using Composer through their inference provider (Fireworks). Presumably the MIT is not viral like the GPL, so Cursor, and companies that use Cursor do not need to display Kimi attribution on their products.

It's definitely not what Kimi wanted, but it sounds like this is what is written.

Eridrus · 2026-03-19T22:24:50 1773959090

Unrelated to FSD, what's a good example where frontier AI struggles with logical thinking that even stupid humans can figure out?

I personally feel like that isn't really true any more.

kerridge0 · 2026-03-19T22:34:42 1773959682

The recent one was should I drive my car to the car wash if it's only 300 feet from my house although it wasn't a slam dunk.

Eridrus · 2026-03-20T12:44:14 1774010654

Right, but if these things are so rare that we all only know the one viral example, I feel like that lends credence to the models basically generally not having this problem.

Researchers built the Winnograd Schema Challenge more than a decade ago to assess common sense reasoning, and LLMs beat that challenge task around GPT 4.

ndsipa_pomu · 2026-03-20T13:35:17 1774013717

They're not so rare. Hallucinations have been spotted everywhere, but the "driving a car to the car wash" is an amusing one that's been recently publicised. Developers aren't going to point out every time an LLM hallucinates an entire library.

carlmr · 2026-03-20T14:35:02 1774017302

I'd add to this, any moderately involved logical or numerical problem causes hallucinations for me on all frontier models.

If you ask them in isolation they may write a script to solve it "properly", but I guess this is because they added enough of these to the training set. But this workaround doesn't scale.

As soon as I give the LLM a proper problem and a small part of it requires numeric reasoning, it almost always hallucinates something and doesn't solve it with a script.

If the logic/math is part of a larger problem the miss rate is near 100%.

LLMs have massive amounts of knowledge, encoded in verbal intelligence, but their logic intelligence is well below even average human intelligence.

If you look at how they work (tokenization and embeddings) it's clear that transformers will not solve the issue. The escape hatches only work very unreliably.

Eridrus · 2026-03-21T17:17:08 1774113428

What's a typical example?

I have been broadly quite happy with gpt 5.4 xhigh's reasoning on things like performance engineering tasks.

lps41 · 2026-03-20T00:05:16 1773965116

If you ask this of any current day AI it will answer exactly how you would expect. Telling you to drive, and acknowledging the comedic nature of the question.

batshit_beaver · 2026-03-20T00:57:06 1773968226

That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.

But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.

snypher · 2026-03-20T03:29:07 1773977347

https://chatgpt.com/s/t_69bcbeeaa2f081918113f42940803007

Seems fine to me?

batshit_beaver · 2026-03-20T04:50:02 1773982202

Are you also an LLM? Do objects often begin rotating when you're only holding them with one hand?

carlmr · 2026-03-20T14:38:11 1774017491

Not unlikely that you're talking to a lot of AI-based AI boosters. It's easier to create astroturfed comments with chatbots than fixing the inherent problems.

sroussey · 2026-03-20T04:48:00 1773982080

I always like to ask AI to generate a middle aged blond man with gray hair. Turns out that all models with gray have black roots.

https://chatgpt.com/share/69bcd01a-a750-800d-95f5-3b840b9ee2...

https://gemini.google.com/share/edc223bb6291 (the try again gave a woman, oops)

Even Midjourney couldn't do it.

carlmr · 2026-03-20T14:39:56 1774017596

Nice. My test was always a blond bald guy. It always adds hair. If you ask for bald you get a dark haired bald guy, if you add blond, you can't get bald because I guess saying the hair color implies hair (on the head), while you may just want blonde eyebrows and/or blond stubble.

Eridrus · 2026-03-19T22:05:14 1773957914

It's not horrifically slow.

Eridrus · 2026-03-18T14:46:30 1773845190

I think plenty of software is a pile of shit and still derive value from it.

mock-possum · 2026-03-18T14:57:51 1773845871

Exactly, better the pile of shit you know than the pile of shit you don’t know - or the pile of shit that is u knowable.

snovymgodym · 2026-03-18T15:14:23 1773846863

Yeah I'd go so far as to say that most useful software is "bad" in some way.

Y_Y · 2026-03-18T19:45:56 1773863156

Worse is Better

Eridrus · 2026-03-18T03:53:37 1773806017

This too will be solved. You can get tye frontier models from AWS/Google/Azure without needing to send your data to anyone else already.

Eridrus · 2026-03-16T14:30:39 1773671439

Companies need databases lol.

I don't know how you think a b2b company could run sales without a CRM like Salesforce.

To give your question a generous interpretation, Salesforce is more valuable than Apptio or your home grown CRM because it already has all the features any sales org needs, and all the fragmented sales and marketing tooling are already integrated with it.

And Sales is a very expensive and also high ROI activity. You don't want your sales team hung up trying to figure out how to get the random CRM to do something. You're not looking to cut costs in this area, you're looking to enhance the overall productivity of the org. Sales tooling overall is very expensive for this reason, any marginal edge is worth a lot.

It's also worth noting that a big value of things like Salesforce is that it lets management check up on what people are doing, because as much as HN doesn't like to admit it, people are often not very careful or diligent, and you need to perform supervision on the vast majority of people to improve their performance.

Jira is similar, in that eng is very expensive, and its probably better than what these companies were doing beforehand, even if it is suboptimal.

threetonesun · 2026-03-16T17:47:51 1773683271

It's true, literally no b2b sales companies existed before Salesforce. We must all continue to pay for Salesforce and support its workflows for now until the endless future, lest b2b sales vanish again.

Eridrus · 2026-03-16T23:49:29 1773704969

Lol, nice straw man there.

Eridrus · 2026-03-14T19:48:52 1773517732

This mostly just sounds like a poison pill that commercial entities wouldn't use, and if you want that you can already use AGPL.

Especially as the cost of producing code drops, the value of libraries decreases.

username223 · 2026-03-14T23:33:51 1773531231

> Especially as the cost of producing code drops, the value of libraries decreases.

Does it? If the cost of slop that (1) no one understands, and (2) no one can be sued for if it misbehaves drops to zero, what have we gained? A "library" is code plus reliability and accountability. (Yes, GPL disclaims liability, but that's why consultants exist.)

Eridrus · 2026-03-15T19:50:09 1773604209

Reliability is important for sure, but as you noted, there is no accountability for library maintainers.

I'm not saying all libraries will go to zero values, just that their value is decreasing.

HN For You