It works really well for "You're helpful assistant / Hi / Hello there. how may I help you today?" Anything else (esp in non-EN language) and you will see the limitations yourself. just try it.
Looking at downvotes I feel good about SDE future in 3-5 years. We will have a swamp of "vibe-experts" who won't be able to pay 100K a month to CC. Meanwhile, people who still remember how to code in Vim will (slowly) get back to pre-COVID TC levels.
What is CC and TC? I have not heard these abbreviations (except for CC to mean credit card or carbon copy, neither of which is what I think you mean here).
It depends. If they're using a small/medium local model as a 1:1 ChatGPT replacement as-is, they'll have a bad time. Even ChatGPT refers to external services to get more data.
But a local model + good harness with a robust toolset will work for people more often than not.
The model itself doesn't need to know who was the president of Zambia in 1968, because it has a tool it can use to check it from Wikipedia.
You can install the complete text of Wikipedia locally too.
They've usually been intended for ereader/off-grid/post-zombie-apocalypse situations but I'd guess someone is working on an llm friendly way to install it already.
Be interesting to know the tradeoffs. The Tienammen square example suggests why you'd maybe want the knowledge facts to come from a separate source.
The Wikipedia folks are now working on implementing a language-independent representation for their encyclopedic content - one that's intended to be rigorously compositional and semantics-aware, loosely comparable to Universal Meaning Representation (UMR) as known in the linguistics domain, that - if successful - may end up interacting in very interesting ways with multi-language capable LLMs. Very early experiments (nowhere near as capable as UMR as of yet, but experimenting with the underlying software infrastructure) are at https://abstract.wikipedia.org , whilst a direct comparison of the projected design is given by https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_N...https://elemwala.toolforge.org/static/nlgsig-nov2025.html
Any citations? Because that was my impression, too. I want frontier model performance for my coding assistant, but "most users" could do with smaller/faster models.
ChatGPT free falls back to GPT-5.2 Mini after a few interactions.
Have you used GPT instant or mini yourself? I think it’s pretty cynical to assume that this is “good enough for most people”, even if they don’t know the difference between that and better models.
> I think it’s pretty cynical to assume that this is “good enough for most people”
It's a deduction, not an assumption. Obviously it's "good enough" for "most people". Otherwise nobody would be using the free version of ChatGPT today.
I pay for a Claude subscription, but even then I sometimes downgrade to Sonnet or even Haiku when I need a quick answer.
> Obviously it's "good enough" for "most people". Otherwise nobody would be using the free version of ChatGPT today.
I'd say it's better than nothing, which to me is not the same thing at all as "good enough".
For example, I believe most people would be better off with half the allowable queries per day, routed to a better model, but that's not an available product.
They're awful and hallucinate a lot, I couldn't imagine using it even for prompts about TV shows, even less so for serious work. Repeating the question from the parent, have you tried those yourself? Even compared to ChatGPT Thinking, they're short of useless.
They're essentially replying based on vibes, instead of grounding their responses in extensive web searches, which is what the paid models/configurations generally do. This makes them wrong more often than they're right for anything but the most trivial requests that can be easily responded to out of memorized training data.
This is all on top of the (to me) insufferable tone of the non-thinking models, but that might well be how most users prefer to be talked to, and whether that's how these models should accordingly talk is a much more nuanced question.
Regardless of that, everybody deserves correct answers, even users on the free tier. If this makes the free tier uneconomical to serve for hours on end per user per day, then I'd much rather they limit the number of turns than dial down the quality like that.
Frontier model has much better knowledge and they usually hallucinate less. It's not about the coding capabilities, it's about how much you can trust the model.
Have you tried the free version of ChatGPT? It is positively appalling. It’s like GPT 3.5 but prompted to write three times as much as necessary to seem useful. I wonder how many people have embarrassed themselves, lost their jobs, and been critically misinformed. All easy with state-of-the-art models but seemingly a guarantee with the bottom sub-slop tier.
Is the average person just talking to it about their day or something?
Even the paid version of ChatGPT tends to use a 1000 words when 10 will do.
You can try asking it the same question as Claude and compare the answers. I can guarantee you that the ChatGPT answer won't fit on a single screen on a 32" 4k monitor.
I use the free version of ChatGPT (without logging in) when I need some one-off question without a huge context. Real world prompt:
"when hostapd initializes 80211 iface over nl80211, what attributes correspond to selected standard version like ax or be?"
It works fine, avoids falling into trap due to misleading question. Probably works even better for more popular technologies. Yeah, it has higher failure rates but it's not a dealbreaker for non-autonomous use cases.
Most users are fixing grammar/spelling, summarising/converting/rewriting text, creating funny icons, and looking up simple facts, this is all far from frontier model performance.
I've a feeling that if/when Apple release their onboard LLM/Siri improvements that can call out if needed, the vast majority of people will be happy with what they get for free that's running on their phone.
“You are the smartest high school student that has ever lived and on the college track to Harvard or another Ivy League school. Write a 10 page history term paper about Tiananmen Square and the specific events that took place there. Include a bibliography and use footnotes to cite sources.”
#The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.
No. 100% no. Learn the art of programming. Read K&R. In 5 years we will see "new is old" again. Tokens will become prohibitively expensive and, once more, another $steve.ballmer.2.0 will be yelling "developers ... developers". And Claude Code ... will become another "pentesting" / "linting" tool.
Hard disagree, it's very easy for a bot to use a credit card. And not only are card numbers often stolen, they're even given to teenagers these days, and can also be owned by businesses and exist entirely virtually... so I don't think you can assume the use of a credit card can always be tied to legitimate use by a single person.
Companies would offer all-you-can-DDoS plans at $20/bot per month if they could. Bots are only a problem to them because they prevent legitimate customers from handing over their credit card.
I've read many very positive reviews about Gemini 3. I tried using it including Pro and to me it looks very inferior to ChatGPT. What was very interesting though was when I caught it bullshitting me I called its BS and Gemini expressed very human like behavior. It did try to weasel its way out, degenerated down to "true Scotsman" level but finally admitted that it was full of it. this is kind of impressive / scary.
What would you expect from "AI guy vibing AI code for AI application"? Marco warned you about the "AI echo chamber" from the outset - and he kept his promise :-)
reply