For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | achrono's commentsregister

How do we know that today's frontier models are merely scaled up versions of that? Genuine question, since the labs have narrowed what they share over the years to now almost nothing, in terms of how the model was trained and how it works under the hood.

We know for sure the architecture of the open weights models since llama.cpp understands the architecture it needs to build to plug the weights into to run them. It's always possible that the latest closed model is doing something architecturally different than the open weights ones we know about, but judging by how close the large open weight models such as DeepSeek are to SOTA performance, this seems unlikely. When OpenAI first came out with their near-mythical "Strawberry" (aka "o1") thinking model there was all sorts of speculation that they had made some sort of architectural breakthough, but then DeepSeek replicated the capability and published how they did it, proving that it was just better training, not any architectural change.

There have been minor changes to the architecture over the years, but these are basically all efficiency tweaks such as various types of attention (some pioneered in the open by DeepSeek) that better scale to large context lengths, and the confusingly named "mixture of experts" architecture, but what's more notable really is how little the architecture has changed. The capability gains have been coming from better training and better data.


DeepSeek research:

- V3 https://arxiv.org/abs/2412.19437

- V2 https://arxiv.org/abs/2405.04434

- R1 https://arxiv.org/abs/2501.12948 (RL applied to ML models was well-known beforehand, but they show it in the open, at scale, on big models)

Then, there's the incentive analysis. If you can see that these models empirically get better with scale, why would you swap the main architecture? Those events will be pretty rare. I'm not saying there's noone cooking a new architecture, just that it is a pretty rare event. And it would have to come from some researchers that would be happy to not publish their findings, which is not really what a sizable portion of elite researchers (obviously not all) are incentivized to do.

Of course, it's a bit of a verbal compression to claim simply 'scaled up'. They are recognisable scaled up transformers, but most new models come with a few tricks, but we're at the point where those usually are not an architectural rewrite and added to solve an explicit problem, like hallucination, not for big new capability gains.


> If you can see that these models empirically get better with scale, why would you swap the main architecture? Those events will be pretty rare

c.f. hardware lotter https://arxiv.org/abs/2009.06489


There are thousands of people working in top level labs. Somebody would leak it

No they are clearly not just scaled up versions of gpt 2; there are different LLM architectures like mixture of experts etc that appeared relatively recently. I am not an expert though, far from it.

MoE and such are basically performance enhancements, they don't make the model smarter.

separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is https://openreview.net/forum?id=iydmH9boLb to read...

Performance enhancements are huge though.

If you can make the existing model faster, you can then save your inference budget to then make your model bigger, which then makes it smarter.

A lot of how smart the models can be comes down to budget. If you can make your existing thing cheaper, you can instead make it bigger for the same price.


Not really “smarter” though? It’s just a big probability engine.

(Not trying to flame bait or anything. I just wouldn’t call LLM as exhibiting intelligence. It is great at making connections based on probability but doesn’t have a semantic understanding of what it is doing)


> to then make your model bigger, which then makes it smarter

There's diminishing returns and at some point making a model bigger makes it dumber.


Performance enhancements are what allow you to train a bigger model.

Across 52 professional domains, current frontier models degrade 25% of document content after just 20 interactions!


It feels unexpected LLM reuse is like saving a .jpg screenshot over and over.


Towards maximizing the sum of individual happiness, power, beauty and knowledge. Maybe a few other attributes in there, but these are the bare minimum that no civilization would deny for itself.

The question of course is 'how'. For the last few centuries, the answer has been technology.


From the article:

> According to the 115-page complaint, Baig discovered through

> internal security testing that WhatsApp engineers could “move

> or steal user data” including contact information, IP addresses

> and profile photos “without detection or audit trail”.

That isn't really the breach you're making it out to be. Profile photos, unless made private/contacts only, are already publicly visible, and so is "contact information".

Of course these are useful to intelligence services, but this doesn't mean that Baig found they don't have true end-to-end encryption.


I love even more how it's a .md file from well before Markdown even existed.


I bet it being a Git repo must straight up feel otherwordly then.

It's just a nice touch.


No, skynet went to the past and gave git and md to microsoft, which then proceeded to create the doc format from the md as a starting point :P


Way before git was even released in 2005.


Really want to know what these "more interesting bits" are that GPT-5-thinking and other models of this calibre cannot do. Unless of course you choose to do them even though these models can in fact do them, in which case, please do share regardless.


In my case, talking to customers and figuring out what problems they are trying to solve or what opportunities they are wanting to pursue


Other than banks & ticketing, there is a whole host of things that do in fact need an app.

* Mobile payments

* Navigation

* All manner of IoT devices

* Wearables!

* Digital versions of ID (Mobile Passport Control)

etc.

So no, you can't just use the web.


But, and I hesitate to point it out, because I am finding that people think it is somehow minimal entry stakes, one does not need any of those things..


You wouldn't get very far without WeChat and AliPay in China. Last time a good friend of mine was there, many merchants simply refused to accept cash. The few that did had made it known how much they were inconvenienced by doing that.

Same for basically every interaction with locals, for accessing government services, or even just using the public transportation.

It's pretty similar for locals AFAIK.

And before anyone replies that he didn't have to travel there — no, he did, unless he was willing to look for another job (which are very sparse here, you hold on to a good job for dear life).


I think this just further demonstrates the truth behind the truly small & scrappy teams culture at OpenAI that an ex-employee recently shared [1].

Even with the way the presenters talk, you can sort of see that OAI prioritizes speed above most other things, and a naive observer might think they are testing things a million different ways before releasing, but actually, they're not.

If we draw up a 2x2 for Danger (High/Low) versus Publicity (High/Low), it seems to me that OpenAI sure has a lot of hits in the Low-Danger High-Publicity quadrant, but probably also a good number in the High-Danger Low-Publicity quadrant -- extrapolating purely from the sheer capability of these models and the continuing ability of researchers like Pliny to crack through it still.

[1] https://calv.info/openai-reflections


Key highlights in addition to the model quality itself:

* real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt)

* router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time.


> Toronto, where immigrants from the subcontinent grow up in enclaves surrounded by other immigrants.

Citation please, because this is sweeping. Two questions to consider:

1. Are these enclaves representative of the subcontinent, or of a few over-represented communities that is actually a small fraction of the Indian subcontinental population?

2. Of all the people from the Indian subcontinent here, how many live in enclaves versus otherwise?


>Citation please

Brampton, Thorncliff, Scarborough, etc. There's no shortage of immigrant heavy neighborhoods.

Heck, I'm in Durham and the demographics are changing rapidly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You