Do you really think developers are going through the hellish pain of dealing with Google and Apple for no reason? Real world users prefer and expect apps as opposed to web versions for many product categories.
Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.
That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.
Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.
The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.
But that is only obvious when your use case is actually pushing the frontier.
If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.
A model with open weights gives you a huge advantage in the real world.
You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.
Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.
When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.
The vision has been that the open and/or small models, while 8-16 months behind, would eventually reach sufficient capabilities. In this vision, not only do we have freedom of compute, we also get less electricity usage. I suspect long-term the frontier mega models will mainly be used for distillation, like we see from Gemini 3 to Gemma 4.
Billions of USD in debt, a business model bleeding cash with no profit in perspective, high-competition environnement, a sub-par product, free-to-use offline models taking off, potential regulatory issues, some investor commitments pulling out... tricky.
But let's not cry for the founders, they managed to get away with tons of money. The problem is for the fools holding the bag.
How is it a subpar product? I've been very happy with GPT 5.4 and the Codex CLI tooling, as well as ChatGPT web. I'd say product is one of their strengths.
I don't use anywhere near $1000/mo of inference. But yes, the question of what to do when prices go up a lot does concern me. However, with respect to product alone, Codex is still very good.
Yeah you guys have to pay attention to the state of the overall economy. We are in the credit-crunch phase of a recession. The funny money has ran out and infinite loans are no longer available. These companies have to find way to pay their debt now
"Notably, increases in codebase size are a major determinant of increases in static analysis warnings and code complexity, and absorb most variance in the two outcome variables. However, even with strong controls for codebase size dynamics, the adoption of Cursor still has a significant effect on code complexity, leading to a 9% baseline increase on average compared to projects in similar dynamics but not using Cursor."
They're measuring development speed through lines of code. To show that's true they'd need to first show that AI and humans use the same number of lines to solve the same problem. That hasn't been my experience at all. AI is incredibly verbose.
Then there's the question of if LoC is a reliable proxy for velocity at all? The common belief amongst developers is that it's not.
This is actually one thing I have found LLMs surprisingly useful for.
I give them a code base which has one or two orders of magnitude of bloat, and ask them to strip it away iteratively. What I'm left with usually does the same thing.
At this point the code base becomes small enough to navigate and study. Then I use it for reference and build my own solution.
Uh huh.. but the data in Andrej's visualizer is showing software development growth outlook is at 15% (much faster than average)
Over the past year (where Opus has supposedly changed the game), we're seeing ~10% more job postings for software developers compared to this time last year [1,2]
A huge amount of our work is not easily verifiable, therefore it's extremely hard to actually train an LLM to be better at it. It doesn't magically get better across the board.
AI HAS WON. SURF OR DROWN. YOU DONT KNOW WHATS COMING!!!?!?!
Stop with this doomer drivel. It's sick. It's not based in reality and all it does is stress innocent people out for no reason.
This is fantasy completely disconnected from reality.
Have you ever tried writing tests for spaghetti code? It's hell compared to testing good code. LLMs require a very strong test harness or they're going to break things.
Have you tried reading and understanding spaghetti code? How do you verify it does what you want, and none of what you don't want?
Many code design techniques were created to make things easy for humans to understand. That understanding needs to be there whether you're modifying it yourself or reviewing the code.
Developers are struggling because they know what happens when you have 100k lines of slop.
If things keep speeding in this direction we're going to wake up to a world of pain in 3 years and AI isn't going to get us out of it.
I’ve found much more utility even pre AI in a good suite of integration tests than unit tests. For instance if you are doing a test harness for an API, it doesn’t matter if you even have access to the code if you are writing tests against the API surface itself.
I do too, but it comes from a bang-for-your-buck and not a test coverage standpoint. Test coverage goes up in importance as you lean more on AI to do the implementation IMO.
reply