For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | nsingh2's commentsregister

It's going to be expensive to serve (also not generally available), considering they said it's the largest model they've ever trained.

I suspect it's going to be used to train/distill lighter models. The exciting part for me is the improvement in those lighter models.


It seems inevitable that costs will come down over time. Expensive models today will be cheap models in a few years.

What's interesting is that scaling appears to continue to pay off. Gwern was right - as always.

Seems like HN is doing something to combat this, considering how many [dead] comments I see in every post (which you can enable by setting `showdead` in your user profile).

I've only recently enabled it so I don't know how frequent dead comments were before the LLM era.


Fair enough. I actually noticed that right after I posted this comment.

To be fair, I've been here for like 15 years and have had show dead on for most of it, and although the quality of them has certainly gotten lower, I'm not convinced that they are more frequent.

Really uncharitable take. I did stupid things at 14, and had more unrestricted internet access too.

> absent parent more concerned with his business than his son

I don't know how you came to this conclusion from the post.


This a big exaggeration. Codex is probably one of the top two LLM programming tools, along with Claude Code. GPT-5.4 models are strong, unlike the initial GPT-5 ones, which were comparatively bad, and can hold up against Opus 4.6. In my experience, they are better at analytical work.

I cannot really see how they are "far behind," or how some plugin for Claude Code is a "last desperate bid." The tools are close enough to each other that I regularly use Codex one month and Claude Code the next without much disruption, just to try out any new models or features that might be available.

I do not have much visibility into the non-code applications, so maybe it is stickier there.

If/when the AI bubble pops and takes OpenAI down with it, I would not expect Anthropic to come out unscathed either.


They were years ahead. They managed to generate competitors (Anthropic is OpenAI refugees) by alienating their own employees by being so dishonest and immoral when compared to their own founding principals and even legal documents. They experienced a coup where the primary technical vision of the company was forced out in favor of someone who is comparatively a nontechnical dummy. That was the beginning of the multiple years of stagnation while they burned tens and hundreds of billions of dollars while their competitors caught up and then passed them by.

OpenAI is floundering and can't sustain their own burn rate. Their competitors are thriving. This is a market and technology that OpenAI largely created and just a few years in they are behind, losing unprecedented amounts of money, and have no clear path to catch up.

Lets be totally clear, they were 3 years ahead 3 years ago and now they are behind. They are literally standing still.


> They were years ahead.

Considering how fast competitors caught up to them, I'm not convinced that OpenAI was years ahead. LLMs and transformers were known technology, it's just that OpenAI accidentally productized it before others did (ChatGPT). This is not an advantage measured in years. Google, for example, could have caught up to them pretty easily (they invented the transformer architecture), I think it mostly came down to mismanagement that they flopped so hard with Bard. The biggest cost was high quality data, Google certainly had that, and a budget for huge training runs. I really don't think OpenAI had any special sauce that made them years ahead.

One confounder here is that LLM scaling has started to hit diminished returns recently, no more GPT3 -> GPT4/o1 jumps in recent times, making it easier to catch up to the SOTA.

That schism within the OpenAI leadership was ugly. And Sam Altman does seem to be a bit snakey to me. But I have no illusions about any company in this space, including Antropic. None of these companies are moral, given what data these models are trained on.

> their competitors caught up and then passed them by

The different models are more capable in different aspects, but they are close enough together that only in a few months they leapfrog each other.

> OpenAI is floundering and can't sustain their own burn rate. Their competitors are thriving.

Google is thriving, sure, but not because of Gemini, it's because of their existing ads business. I would not say that about Anthropic, they seem to be struggling to provide enough compute (with the recent usage limit changes). Hard to know whats happening funding wise in these companies. Saying that their competitors are thriving is a stretch. And again, if the AI bubble pops, Antropic is gonna hurt along with OpenAI. Just not clear to what extent.


Their competitors caught up after about 3 years though. Gemini 2.5 was more or less awful vs even GPT 3/4. Models have more than one measure of quality so they don't cleanly totally order, but Gemini 2.5 was awful. Gemini 3.1 is better than GPT 5.3 and competitive with 5.4 and preceded it by months.

Gemini CLI has been broken for the past 2-3 days, with no response from Google. Really embarrassing for a multi-trillion dollar company. At this point Codex is the only reliable CLI app, out of the big three.

https://www.reddit.com/r/GeminiCLI/comments/1s49pag/this_is_...


This morning I hit 100% 5hr usage on a task that took ~10% in the past. Looks like they are still testing the limits, but it seems over-tuned to me.

Also not great that they communicate this now, since people have been complaining about sudden and strange usage spikes for a few days with no response from Anthropic.


>> More free time?

> Yes! Time we can reclaim from the mundane chores of life to do with as we choose! How could you not want that?

We already had a huge productivity boom these past decades, but wages flat-lined and the vast majority of the profits and surplus went to the top. Housing, education, and healthcare became less affordable, not more. History points against your simple view.

I'm not convinced that AI breaks that pattern. If anything, the concentration is worse this time. The capital required is huge, the technology is controlled by a handful of companies, and the most applications are about replacing labor. That last part further erodes the already meager worker bargaining power.

We do need a serious systemic change to get to the world you're envisioning. One where that congealed wealth needs to start flowing again.


The majority of Ask/Debug mode can be reproduced using skills. For copying code references, if you're using VS Code, you can look at plugins like [1], or even make your own.

Cursor's auto mode is flaky because you don't know which model they're routing you to, and it could be a smaller, worse model.

It's hard to see why paying a middleman for access to models would be cheaper than going directly to the model providers. I was a heavy Cursor user, and I've completely switched to Codex CLI or Claude Code. I don't have to deal with an older, potentially buggier version of VS Code, and I also have the option of not using VS Code at all.

One nice thing about Cursor is its code and documentation embedding. I don't know how much code embedding really helps, but documentation embedding is useful.

[1] https://marketplace.visualstudio.com/items?itemName=ezforo.c...


From [1] (2022 numbers), the median creator earned around 50 Robux per year, which is ~19 cents with the current DevEx rate, and the average was 13,500 Robux.

Out of ~7.5 million creators in 2022, only 11,000 qualified for cashing out.

The distribution is brutal, realistically you have to stick with it for years before getting a hit, if ever. Not to mention the stats probably look worse in the LLM era. You definitely have to like doing it as a hobby.

One caveat is that the creator total likely includes a lot of casual experimentation. If many users make one or two games and then stop (I can see most kids doing this), the 7.5 million figure may overstate how many people are seriously trying to make money from it.

[1] https://about.roblox.com/newsroom/2023/07/vision-roblox-econ...


From what I've read online it's not necessarily a unquantized version, it seems to go through longer reasoning traces and runs multiple reasoning traces at once. Probably overkill for most tasks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You