More

wahnfrieden · 2026-04-25T05:46:44 1777096004

It’s game over for a very long time

wahnfrieden · 2026-04-25T04:50:43 1777092643

Copilot is a bad harness that perverts the productivity of models like GPT 5.5.

wahnfrieden · 2026-04-25T04:47:35 1777092455

Have you considered running models like GPT 5.5 inside their agent harness (Codex)?

gertlabs · 2026-04-25T07:03:34 1777100614

I see the value in that, but there are a few reasons that isn't on the immediate roadmap -- mainly, it shifts focus from measuring the model to measuring the harness. The agentic benchmark section you see on the site is comparable to how an agent would perform using an open harness like Pi. But latest tool-using models are pretty well adapted to any harness, so I think that's less of a factor in overall model performance.

wahnfrieden · 2026-04-25T07:59:46 1777103986

Just fresh on my mind after reading this from Codex team member re: performance difference between Pi and Codex app server usage: https://x.com/pashmerepat/status/2046865863979172039

ZeroGravitas · 2026-04-25T10:36:55 1777113415

Well that couldn't be vaguer if he tried. Basically saying, our stuff is better, no reasons given.

wahnfrieden · 2026-04-25T19:40:01 1777146001

Yeah that's why I'm advocating for measuring it in this thread. Some of these models are trained specifically for their official harnesses

wahnfrieden · 2026-04-24T22:21:24 1777069284

Have you measured whether “no bugs, make no mistakes” improves results? Or is the very thought of it too absurd to you to evaluate?

falcor84 · 2026-04-24T23:02:34 1777071754

I haven't tried it myself, but, I would assume that this sort of instruction in CLAUDE.md would indeed make it a bit more careful, to the detriment of its development velocity, which for my use-case would be bad. I generally prefer for it to experiment in many directions rapidly, and only once we have an approach that solves the problem well, to do extensive testing.

tkiolp4 · 2026-04-24T22:27:48 1777069668

When I was younger I was sold in the idea of data driven decisions. Everything needs to be measured, otherwise you are just biased, and bias is bad. Nowadays I do still rely on data and measurements but I also have experience and taste to judge things. Answering your question, the latter.

wahnfrieden · 2026-04-24T04:44:34 1777005874

First result is Windows which has had more problems with Codex (or at least, up until a few months ago). Second is someone who asked Codex to delete all files that were unrelated to the project files.

wahnfrieden · 2026-04-23T19:34:21 1776972861

wahnfrieden · 2026-04-23T19:32:21 1776972741

Are weekends off un-american too because it came from worker movements?

Re: replies that one day off has been around much longer. Yes that’s what changed - the change was for 2 days off.

BurningFrog · 2026-04-23T19:44:23 1776973463

Saturday's off came from Exodus 20:8-11, about 1400 BC.

wahnfrieden · 2026-04-23T23:24:07 1776986647

Yes I know it was that bad for that long. The worker movement was to expand that to two days.

TeMPOraL · 2026-04-23T19:45:06 1776973506

Saturdays are communist. Sundays are far-right.

mrbombastic · 2026-04-23T20:21:30 1776975690

What do i have to be to get Fridays too?

mr_toad · 2026-04-23T20:50:31 1776977431

Be French, and get divorced?

selimthegrim · 2026-04-23T20:47:11 1776977231

Muslim?

TeMPOraL · 2026-04-23T21:36:33 1776980193

You can get one Good Friday a year if you live in a country that treats it as bank holiday, or is Catholic enough that it's effectively a day off, even if not an official one.

You can get extra Fridays off if you move to a country with bank holidays that tend to land on Fridays, which is correlated with history of either communism or organized religion (much like the weekend).

But, if you want every Friday off, your best bet is to embrace hyper-capitalism and worm your way money so you can have four-day work week.

(Easier to achieve than the legendary four-hour work week anyway.)

TL;DR: the more opposing ideologies you can simultaneously hold, the more days off in a week you're morally entitled to :).

wahnfrieden · 2026-04-23T18:29:54 1776968994

I like that Anthropic rushed 4.7 out to get a couple days of coverage before 5.5 hit

spprashant · 2026-04-23T19:00:55 1776970855

Everything since that launch to this release has been a PR disaster for Anthropic.

dandaka · 2026-04-23T19:20:30 1776972030

I can argue that disaster started mid-4.6, when they started juggling with rate limits while hitting uptime problems. Great we have some healthy competition and waiting for the next move from Deepmind.

gck1 · 2026-04-23T21:22:47 1776979367

Correct. Anthropic has been on disaster train since January and they can't seem to get off that train.

wahnfrieden · 2026-04-23T18:27:06 1776968826

And yet there was an outage last night

lawgimenez · 2026-04-23T19:13:49 1776971629

And they're having an outage right now.

wahnfrieden · 2026-04-23T18:21:01 1776968461

You aren’t getting the 5.4 experience for code if you’re not using it in the Codex harness

HN For You