This was due to Claude Code the agent harness. 4.6 was trained to use tools and operate in an agent environment. This is different from there being a huge bump in the underlying model's intelligence.
The takeaway here I think is that the "breakthrough" already happened and we can't extrapolate further out from it.
Everyone wouldn't starve in a few months. There is more than enough food and I have faith it'd be given out. The starvation we see today in a world where most genuinely have a chance to get out of it is nothing like a world in which people can't earn an income.
The government only has as much power as they are given and can defend, and the only way I could see that happening is via automated weapons controlled by a few- which at this point aren't enough to stop everyone. What army is going to purge their own people? Most humans aren't psychopaths.
I think it'd end in a painful transition period of "take care of the people in a just system or we'll destroy your infrastructure".
> The government only has as much power as they are given and can defend, and the only way I could see that happening is via automated weapons controlled by a few- which at this point aren't enough to stop everyone. What army is going to purge their own people? Most humans aren't psychopaths.
I think you're right for the immediate future.
I suspect while we're still employing large numbers of humans to fight wars and to maintain peace on the streets it would be difficult for a government to implement deeply harmful policies without risking a credible revolt.
However, we should remember the military is probably one of the first places human labour will be largely mechanised.
Similarly maintaining order in the future will probably be less about recruiting human police officers and more about surveillance and data. Although I suppose the good news there is that US is somewhat of an outlier in resisting this trend.
But regardless, the trend is ultimately the same... If we are assuming that AI and robotics will reach a point where most humans are unable to find productive work, therefore we will need UBI, then we should also assume that the need for humans in the military and police will be limited. Or to put it another way, either UBI isn't needed and this isn't a problem, or it is and this is a problem.
I also don't think democracy would collapse immediately either way, but I'd be pretty confident that in a world where fewer than 10% of people are in employment and 99%+ of the wealth is being created by the government or a handful of companies it would be extremely hard to avoid corruption over the span of decades. Arguably increasing wealth concentration in the US is already corrupting democratic processes today, this can only worsen as AI continues exacerbates the trend.
Of course it's what they're going for. If they could do it they'd replace all human labor - unfortunately it's looking like SWE might be the easiest of the bunch.
The weirdest thing to me is how many working SWEs are actively supporting them in the mission.
The day I start freaking out about my job is the day when my non-engineer friend turned vibe coder understands how, or why the thing that AI wrote works. Or why something doesn't work exactly the way he envisioned and what does it take to get it there.
If it can replace SWEs, then there's no reason why it can't replace say, a lawyer, or any other job for that matter. If it can't, then SWE is fine. If it can - well, we're all fucked either way.
> If it can replace SWEs, then there's no reason why it can't replace say, a lawyer
SWE is unique in that for part of the job it's possible to set up automated verification for correct output - so you can train a model to be better at it. I don't think that exists in law or even most other work.
What is the automated verification of correct output and who defines that?
But before verification, what IS correct output?
I understand SWE process is unique in that there are some automations that verify some inputs and outputs, but this reasoning falls into the same fallacies that we've had before AI era. First one that comes to mind is that 100% code coverage in tests means that software is perfect.
Right, and that's why it's only part of the job. The benchmarks they're currently doing compose of the AI being handed a detailed spec + tests to make pass which isn't really what developing a feature looks like.
Going from fuzzy under-defined spec to something well defined isn't solved.
Going from well defined spec to verification criteria also isn't.
Once those are in place though, we get https://vinext.io - which from what I understand they largely vibe-coded by using NextJS's test suite.
> First one that comes to mind is that 100% code coverage in tests means that software is perfect
I agree.. but I'm also not sure if software needs to be perfect
Agree. Anthrophic in particular have been quite clear in what they are trying to do. Every blog post about every new model almost dismisses every other use case other than coding - every other use case seems almost a footnote in their communication.
Pre-training is not a good term if you are trying to compare it to LLM pre-training. Closer would be the model's architecture and learning algorithms which has been designed through decades of PhD research, and my point on that is that the differences are still much greater than the similarities.
The difference here is that everyone else in this product category are also sprinting full steam ahead trying to get as many users as they can
If they DIDN'T heavily vibe-code it they might fall behind. Speed of implementation short term might beat out long-term maintenance and iteration they'd get from quality code
> If they DIDN'T heavily vibe-code it they might fall behind
For you and I, sure - sprint as fast as we can using whatever means we can find. But when you have infinite money, hiring a solid team of traditional/acoustic/human devs is a negligible cost in money and time.
Especially if you give those devs enough agency that they can build on the product in interesting and novel ways that the ai isn’t going to suggest.
Everything is becoming slop now, and it almost always shows. I get why when you’re resource constrained. I don’t get why when you’re not.
Code quality never really mattered to users of the software. You can have the most <whatever metric you care about> code and still have zero users or have high user frustration from users that you do have.
Code quality only matters in maintainability to developers. IMO it's a very subjective metric
99.999999% of products can't get away with what Anthropic is able to - this is a one in a billion disruptive product with minimal competition, and its success so far is mostly due to Claude the model, not the agent harness
The takeaway here I think is that the "breakthrough" already happened and we can't extrapolate further out from it.
reply