For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | wahnfrieden's commentsregister

It’s game over for a very long time

Copilot is a bad harness that perverts the productivity of models like GPT 5.5.

Have you considered running models like GPT 5.5 inside their agent harness (Codex)?

I see the value in that, but there are a few reasons that isn't on the immediate roadmap -- mainly, it shifts focus from measuring the model to measuring the harness. The agentic benchmark section you see on the site is comparable to how an agent would perform using an open harness like Pi. But latest tool-using models are pretty well adapted to any harness, so I think that's less of a factor in overall model performance.

Just fresh on my mind after reading this from Codex team member re: performance difference between Pi and Codex app server usage: https://x.com/pashmerepat/status/2046865863979172039

Well that couldn't be vaguer if he tried. Basically saying, our stuff is better, no reasons given.

Yeah that's why I'm advocating for measuring it in this thread. Some of these models are trained specifically for their official harnesses

Have you measured whether “no bugs, make no mistakes” improves results? Or is the very thought of it too absurd to you to evaluate?

I haven't tried it myself, but, I would assume that this sort of instruction in CLAUDE.md would indeed make it a bit more careful, to the detriment of its development velocity, which for my use-case would be bad. I generally prefer for it to experiment in many directions rapidly, and only once we have an approach that solves the problem well, to do extensive testing.

When I was younger I was sold in the idea of data driven decisions. Everything needs to be measured, otherwise you are just biased, and bias is bad. Nowadays I do still rely on data and measurements but I also have experience and taste to judge things. Answering your question, the latter.

First result is Windows which has had more problems with Codex (or at least, up until a few months ago). Second is someone who asked Codex to delete all files that were unrelated to the project files.

Huh?

Are weekends off un-american too because it came from worker movements?

Re: replies that one day off has been around much longer. Yes that’s what changed - the change was for 2 days off.


Saturday's off came from Exodus 20:8-11, about 1400 BC.

Yes I know it was that bad for that long. The worker movement was to expand that to two days.

Saturdays are communist. Sundays are far-right.

What do i have to be to get Fridays too?

Be French, and get divorced?

Muslim?

You can get one Good Friday a year if you live in a country that treats it as bank holiday, or is Catholic enough that it's effectively a day off, even if not an official one.

You can get extra Fridays off if you move to a country with bank holidays that tend to land on Fridays, which is correlated with history of either communism or organized religion (much like the weekend).

But, if you want every Friday off, your best bet is to embrace hyper-capitalism and worm your way money so you can have four-day work week.

(Easier to achieve than the legendary four-hour work week anyway.)

TL;DR: the more opposing ideologies you can simultaneously hold, the more days off in a week you're morally entitled to :).


I like that Anthropic rushed 4.7 out to get a couple days of coverage before 5.5 hit

Everything since that launch to this release has been a PR disaster for Anthropic.

I can argue that disaster started mid-4.6, when they started juggling with rate limits while hitting uptime problems. Great we have some healthy competition and waiting for the next move from Deepmind.

Correct. Anthropic has been on disaster train since January and they can't seem to get off that train.

And yet there was an outage last night

And they're having an outage right now.

You aren’t getting the 5.4 experience for code if you’re not using it in the Codex harness

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You