For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | ivanech's commentsregister

Hmm in my experience (I've done a lot of head-to-heads), Opus 4.6 is a weaker reviewer than GPT 5.4 xhigh. 5.4 xhigh gives very deep, very high-signal reviews and catches serious bugs much more reliably. I think it's possible you're observing Opus 4.6's higher baseline acceptance rate instead of GPT 5.4's higher implementation quality bar.

This is also my experience using both via Augment Code. Never understood what my colleagues see in Claude Opus, GPT plans/deep dives are miles ahead of what Opus produces - code comprehension, code architecture is unmatched really. I do use Sonnet for implementation/iteration speed after seeding context with GPT.

I agree. Opus, forget the plan mode - even when using superpowers skill, leaves a lot of stuff dangling after so many review rounds.

Along with claude max, I have a chatgpt pro plan and I find it a life-saver to catch all the silliness opus spits out.


I agree, I use codex 5.4 xhigh as my reviewer and it catches major issues with Opus 4.6 implementation plans. I'm pretty close to switching to codex because of how inconsistent claude code has become.

Maybe it's all just anecdotal then. Everyone is having different experiences.

Maybe we're being A/B tested.


The experience one has with this stuff is heavily influenced by overall load and uptime of Anthopic's inference infra itself. The publicly reported availability of the service is one 9, that says nothing of QoS SLO numbers, which I would guess are lower. It is impossible to have a consistent CX under these conditions.

This feels similar to March 2020 when COVID was in Seattle. “It’s in the US but maybe it’s just a one-off.” We’ll see, I guess.


Agreed. Now that one more company has announced a big (40%!) headcount cut, other CEOs will feel like it is ok to do so now too (someone else stuck their neck out first, safe to pile on, "every one else is doing it", etc).

I expect to start hearing about more big riffs soon. :/


this was really delightful. The Easter eggs in particular made it feel like someone was actually on the other side


I find Opus 4.5 very, very strong at matching the prevailing conventions/idioms/abstractions in a large, established codebase. But I guess I'm quite sensitive to this kind of thing so I explicitly ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.


I don’t know what I’m doing wrong. Today I tried to get it to upgrade Nx, yarn and some resolutions in a typescript monorepo with about 20 apps at work (Opus 4.5 through Kiro) and it just…couldn’t do it. It hit some snags with some of the configuration changes required by the upgrade and resorted to trying to make unwanted changes to get it to build correctly. I would have thought that’s something it could hit out of the park. I finally gave up and just looked at the docs and some stack overflow and fixed it myself. I had to correct it a few times about correct config params too. It kept imagining config options that weren’t valid.


> ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.

People keep telling me that an LLM is not intelligence, it's simply spitting out statistically relevant tokens. But surely it takes intelligence to understand (and actually execute!) the request to "read adjacent code".


I used to agree with this stance, but lately I'm more in the "LLMs are just fancy autocomplete" camp. They can just autocomplete increasingly more things, and when they can't, they fail in ways that an intelligent being just wouldn't. Rather that just output a wrong or useless autocompletion.


They're not an equivalent intelligence as human's and thus have noticeably different failure modes. But human's fail in ways that they don't (eg. being unable to match llm's breadth and depth of knowledge)

But the question i'm really asking is... isn't it more than a sheer statistical "trick" if an LLM can actually be instructed to "read surrounding code", understand the request, and demonstrably include it in its operation? You can't do that unless you actually understand what "surrounding code" is, and more importantly have a way to comply with the request...


In a sense humans are fancy autocomplete, too.


You know that language had to emerge at some point? LLMs can only do anything because they have been fed on human data. Humans actually had to collectively come up with languages /without/ anything to copy since there was a time before language.


I actually don't disagree with this sentiment. The difference is we've optimised for autocompleting our way out of situations we currently don't have enough information to solve, and LLMs have gone the opposite direction of over-indexing on too much "autocomplete the thing based on current knowledge".

At this point I don't doubt that whatever human intelligence is, it's a computable function.


I believe the new grad DOGE employees were GS-15s. So yes, it seems likely that they plan to hire at GS-14 or GS-15.


Nothing like putting in a multi decade civil service career and coming in one day to find a 20-something installed over you whose primary qualification was being hired at a "friendly" tech company and making the right kind of joke around the CEO.

... although that seems depressingly like it would also be the experience with new administrators being installed in executive agencies every 4 years, except they're slightly older.

Man, if only there were some way to retain talent in the face of political leadership transitions... https://en.wikipedia.org/wiki/Pendleton_Civil_Service_Reform...


That’s the life of a civil servant though

By function a GS will ALWAYS be subordinate to a political appointee and there’s nothing they can do about it

I posted elsewhere that I left a govt career as a military officer precicely because of this reality. It’s like a old boring joke now that politicians are corrupt and worthless.

I will tell you from the inside that not only is it true but it’s 10 times to 100 times worse than you think it is.

I have multiple stories of operational systems, functions, whatever you wanna call them that we’re working exceptionally well had good backing, good funding and were completely wiped out because whoever became the deputy under secretary for that budget line decided they didn’t want to do it anymore. and completely shelved decades worth of work. Like literally I remember having to unplug a server that was running life-critical beacons for POWs because they weren’t being used enough.

As if that weren’t enough that same development problem then shifted over to some new hot organization that is in the politicians jurisdiction and then they start over from scratch with none of the learning from the previous admin.

There is no positive system that can be affected by the United States government

It does not exist, they cannot functionally or structurally exist, because the government of the United States but is not and has never been built on supporting citizens or the global community it is built and has always been built to support wealthy politicians and that’s all.

I’m not aware of how every other countries work but the ones I’ve seen the inside are the same

Going into the government for the “mission” is probably the most intentionally ignorant thing somebody could do given the plethora of easily accessible data proving exactly this


Somehow this country has managed to do big and bold things when it is needed. Those great systems that were dismantled got built at one point so it is theoretically possible to do good. Furthermore other countries seem to do a better job at serving their citizens so its not like effective government is impossible((look at how the EU at least gets some things that benefit their citizens even though most of it is a mess).

There has got to be some pathway to get back to that.


All those things were reactions to either disasters or radical growth.

The only way to make people act is to create a situation they can’t avoid


> a GS will ALWAYS be subordinate to a political appointee

It’s worth being specific about what is meant by “political appointee” here. That term has specific legal meaning in the context of federal staffing, and (as I understand it, not a lawyer) is not the same thing as “GS employee who was hired as part of an administration’s political agenda”.


Cause a “political” GS is not a thing hence why they have either congressional appointment or alternative pathway to political appointment


> Nothing like putting in a multi decade civil service career and coming in one day to find a 20-something installed over you

GS grade does not correspond directly to manager/managee relationships at plenty of federal agencies. Someone getting hired at a higher GS grade is not automatically “over you” in the formal reporting hierarchy. That’s not to say this never happens (GS:org chart level is the case more often than not, I’d guess), but it’s not a given.

Now, if your issue is that agencies sometimes offer high (by the standards of current employees) GS grades to attract talented hires, then I agree that is a problem! The solution to that is to improve government pay scales and fix fed hiring more generally: https://www.eatingpolicy.com/p/dear-mr-kupor-please-fix-fede...

Until that is done, (good) policies like the Pendleton Act cannot help that much.


Not being dismissive of your experience (or that of a civil servant with 10s of years of experience). I have a deep respect for that kind of work and folks who give up more lucrative opportunities in order to serve their country and fellow citizens.

> whose primary qualification was being hired at a "friendly" tech company and making the right kind of joke around the CEO

That’s being awfully dismissive of the individuals skill set. Nobody gets the job by making the right kind of jokes around the CEO. Nobody. Getting in the door takes hard work, talent and some amount of luck.


For DOGE specifically? Would be interested to hear of those DOGE employees who truly deserved to be GS-15s due to their extensive experience in both tech and government.


> extensive experience in both tech and government.

The USDS (group that was renamed to a part of DOGE) has previously hired with an emphasis on non government experience: http://govciomedia.com/usds-developing-innovative-approach-t...


The USDS and DOGE had completely different mandates. Non government experience makes sense when you’re trying to learn the lessons of industry to improve gov website accessibility, performance and ux.

On the other hand, trying to slash spending with no understanding of the agencies you’re working at- let alone any life experience for a lot of these folks- is a very different mandate.


because, at the time, slick landers, and general good UI/UX was completely missing from government tech workflows.


It still takes around 5-15 years to get to the upper end of the pay scale, currently $195K.


Not true

I was hired in under HQE accession in 2019 and made SES 4 equivalent with zero civilian time in service.


https://www.opm.gov/policy-data-oversight/pay-leave/pay-admi...

Doesn't sound like you're talking about General Schedule.


Correct and there’s no legal requirement to use the GS.

This new force could easily and legally acquire and pay through other schedules - happens all the time.


tried replicating w/ a slightly different system prompt w/ sonnet-4.5 and got some different results, esp w/ progressive to conservative questions. Prompting seems pretty load-bearing here


AI tools have been so good for me for making home-cooked software. As a new-ish parent, it’s so much easier to do stuff. I don’t need to go into extra-deep focus mode to learn how to center a div for the hundredth time, I can spend that precious focus time on the problems that matter / the core motivation.


I had a good laugh at the image showing that 2018 had a “quality focus”

I think Jonathan Blow gave his “Preventing the Collapse of Civilization” talk (much stronger treatment of the subject matter) around that time, also about how software quality was only going down


> Gold has reached all time highs

this is true

> US debt (ie T-bills) selling at all time lows

this is not true unless you’re doing some kind of adjustment. For t bills, us03m yields were much higher 30-40 years ago.

> US equities are at all time highs

this is true

> USD falling day over day, month over month, year over year

if falling means inflation, yes in banal way. If falling means relative to other currencies, that’s the last 9 months or so. Previously the USD was quite strong

> US debt is falling in value because no one wants to buy it

this appears to be the hinge of the argument? It is not true. 10y yields have been down / flat since beginning of 2025 (i.e., price up). also tsy auctions remain well-subscribed / within historical range


I would not debase myself by causing the starvation of children.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You