For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | giwook's commentsregister

Do you mind elaborating on your experience here?

Just curious as I've often heard that Claude was superior for planning/architecture work while ChatGPT was superior for actual implementation and finding bugs.


Claude makes more detailed plans that seem better if you just skim them, but when analyzed, has a lot of errors, usually.

It compensates for most during implementation if you make it use TDD by using superpower et al, or just telling it to do so.

GPT 5.4 makes more simple plans (compared to superpowers - a plugin from the official claude plugin marketplace - not the plan mode), but can better fill the details while implementing.

Plan mode in Claude Code got much better in the last months, but the lacking details cannot be compensated by the model during the implementation.

So my workflow has been:

Make claude plan with superpowers:brainstorm, review the spec, make updates, give the spec to gpt, usually to witness grave errors found by gpt, spec gets updates, another manual review, (many iterations later), final spec is written, write the plan, gpt finds mind boggling errors, (many iterations later), claude agent swarm implements, gpt finds even more errors, I find errors, fix fix fix, manual code review and red tests from me, tests get fixed (many iterations later) finally something usable with stylistic issues at most (human opinion)!

This happens with the most complex features that'd be a nightmare to implement even for the most experienced programmers of course. For basic things, most SOTA modals can one-shot anyway.


Interesting. Have you ever had Claude re-review its plan after having it draft the original plan? Or do you give it to GPT right away to review?

Just curious as I'm trying to branch out from using Claude for everything, and I've been following a somewhat similar workflow to yours, except just having Claude review and re-review its plan (sometimes using different roles, e.g. system architect vs SWE vs QA eng) and it will similarly identify issues that it missed originally.

But now I'm curious to try this while weaving in more GPT.


I use both GH Copilot as well as CC extensively and it does seem more economical, though I wonder how long this will last as I imagine Github has also been subsidizing LLM usage extensively.

FWIW it feels like GH Copilot is a cheaper version of OpenRouter but with trade-offs like being locked into VSCode and the Microsoft ecosystem overall. I already use VSCode though and otherwise I don't see much downside to using GH Copilot outside of that.


You’re not locked into vscode. There are plugins for other IDEs, and a ‘copilot’ cli tool very similar to Claude Code’s cli tool.

I also wouldn’t say you’re locked into Microsoft’s ecosystem. At work we just have skills that allow for interaction with Bitbucket and other internal tooling. You’re not forced to use GitHub at all.



I'm hopeful because Microsoft already has a partnership and owns much of OpenAI so can get their models at cost to host on Azure with they already do, so they can pass on the savings to the user. This is why Opus is 3x as expensive in Copilot, because Microsoft needs to buy API usage from Anthropic directly.

I don’t think it’s API costs. Their Sonnet 4.6 is just 1x premium request which matches the 1x cost of the various GPT Codex models.

Sonnet is the worse model though, therefore it's expected that it is cheaper, the comparison would be Opus and GPT. That Anthropic's worse model is the same request cost as the best OpenAI model is what I mean when talking about Microsoft flexing their partnership.

You could use something like [https://opencode.ai](OpenCode) which supports integration with Copilot.

> but with trade-offs like being locked into VSCode and the Microsoft ecosystem overall

You can use GH Copilot with most of Jetbrains IDEs.


Just to clarify, one does not get access to the pro model on the Pro plan?

The $20 Plus plan still exists, and does not give access to the pro model.

The $200 Pro plan still exists, and does give access to the pro model.

What is new is a $100 Pro plan that does give access to the pro model, with lower usage limits than the $200 Pro plan.


This is still worse than Anthropic's right? Because you get access to their top model even at the $20 price point

It's not worse, Anthropic simply has no equivalent model (if you don't consider Mythos) of GPT 5.4 Pro. Google does though: Gemini 3.1 Deep Think.

GPT 5.4 Pro is extremely slow but thorough, so it's not meant for the usual agentic work, rather for research or solving hard bugs/math problems when you provide it all the context.


I'm genuinely asking, when you say Gemini 3.1 DT is an equivalent model of GPT 5.4 Pro, is there a specific benchmark/comparison you're referring to or is this more anecdotal?

And do you mean to say that you don't really use GPT 5.4 Pro unless it's for a hard bug? Curious which models you use for system design/architecture/planning vs execution of a plan/design.

TIA! I'm still trying to figure out an optimal system for leveraging all of the LLMs available to us as I've just been throwing 100% of my work at Claude Code in recent months but would like to branch out.


Pro and DT model are equivalents because

- internally same architecture of best of N

- not available in the code harness like Codex, only in the UI (gpt has API)

- GPT-5.4 pro is extremely expensive: $30.00 input vs $180.00 output

- both DT and Pro are really good at solving math problems


So, reading the tea leaves, they're either losing subscribers for the $200 plan, or they're not following the same hockey stick path of growth they thought they were... maybe?

Edit: I wonder if this is actually compute-bound as the impetus


Nope, it's just that a lot of people (especially those using Codex) asked us for a medium-sized $100 plan. $20 felt too restrictive and $200 felt like a big jump.

Pricing strategy is always a bit of an art, without a perfect optimum for everyone:

- pay-per-token makes every query feel stressful

- a single plan overcharges light users and annoyingly blocks heavy users

- a zillion plans are confusing / annoying to navigate and change

This change mostly just adds a medium-sized plan for people doing medium-sized amounts of work. People were asking for this, and we're happy to deliver.

(I work at OpenAI.)


Did you modify the Plus plans usage recently or as part of this introduction? Given that Pro plans usage are multiples of it (5x/20x) and given reports of less Plus usage, clarification would be appreciated?

Transparency on this sort of thing is the best way to address negative company sentiment.


I'm honestly not sure, as I don't work on it. My understanding from afar is:

- There was a 2x promotion in March that ended on April 2, so limits have felt tighter since then

- We sometimes reset rate limits after bugs or milestones or because Tibo feels generous, which can make some days feel different than others (they are typically announced here: https://x.com/thsottiaux)

- Recently Plus was tweaked to have a smaller 5h limit but an increased weekly limit

- Lastly, as part of the new Pro launch, the $100 & $200 Pro tiers are getting a 2x promotion, meaning they are temporarily 10x/40x instead of 5x/20x

I've asked our team to clarify the pricing page. Agree it's not clear.


Thanks for the response. I tried to phrase my postulations as just that, I didn’t intend to be an accusatory.

You like the job? How’s the day-to-day go? Yanking tickets or more organic?


All good, I interpreted it as postulation and not accusation. :)

I do like the job! Much more organic than yanking tickets, though I'm on the model training side of things, rather than product side. Always a balance between short-term sprints patching bad behaviors for the next model vs long-term investments in infra and science that make future work easier. Sometimes the negative press gets to me a bit (it's a very different feeling than 2022 or 2023), but my goal is just to make the most useful product I can for people. It's been wild how much Codex has already changed my day-to-day work, I'm so curious to see what it looks like in 2030 or 2040.


Plenty of people wanted to spend more than $20 but less than $200 for a plan. It's long overdue IMO.

Plus plan doesn't get the pro model, which is (AFAICT) the same 5.4 model but thinks like a lot.

You're trying to make words mean what we all think they mean. Stop foisting your Textualism upon us!

LOL telepathy!

It's actually via quantum entanglement.


And then some vibe code reviewing.

> My prima facie view on Altman has been that he presents as sincere.

That is how pathological liars present.


What kind of situation would I choose to use the word presents in that context without being aware of that fact?

I am also aware that sincere people present that way.

I don't believe there is any rational way to consider the appearance of innocence as evidence of guilt.


There is a ton of evidence out there that points to guilt. No one implied the appearance of innocence was evidence of guilt (as much as I admire the creativity in your interpretation, Mr. Self-Described Altman Apologist).

Making a selective quote the way you did with the response you provided made my interpretation reasonable.

What other point could you have been making? You made no reference to any other evidence.

>as much as I admire the creativity in your interpretation, Mr. Self-Described Altman Apologist).

I am unsure if this is deliberate irony, or poor comprehension.


Use the multitude of search tools within your grasp. It is difficult to avoid the evidence.

It may be more of a mental block than anything else.


There may be a reason why Altman is talked about a lot. This article in particular surfaces real information and new perspectives we've not heard in this level of detail before on some pretty significant topics that will be impacting you, me, and pretty much everyone we know not only today but well into the future.

You have a point in that Anthropic deserves some coverage too and that there are interesting perspectives that we've not heard of on that front either.

But just because that's true doesn't mean this article isn't very much relevant and needed.

Because it is.


The New Yorker has given plenty of coverage about Anthropic in their past issues earlier this year.

Any plans to tackle any of the other folks who might be mentioned in the same sentence as Altman, like Darius Amodei?

[flagged]


I think the comment was out of legitimate interest rather than weighing one against the other


Huh? It's a genuine question. The article is great and the writer did a fantastic job.

Please try to give people the benefit of the doubt though I know it's hard in today's society.


tl;dr

No, he cannot.


Pretty sure it's still gone and you should be using effort level now for this.

No, ultrathink is back and it's the same thing as high effort for the message in which it is included

Right but wasn’t high effort the default effort before? So ultrathink is gone in all but name.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You