They (and other AI players) have been using WAU over DAU for all their metrics, and many have questioned why. But if you look at other data sources of AI adoption, the reason is clear: Even while 56% of Americans now "regularly" use GenAI on a weekly basis, a much smaller percentage 10 - 14% use it on a daily basis. Here's one source but others had similar numbers: https://www.genaiadoptiontracker.com/
56% is much more impressive than 14%.
This may look bad until you consider that all of them are already desperately strapped for compute. I think the lower DAU is due to a combination of that and people still figuring out how to use AI.
>> I switched to Codex and found it extremely inferior for my use case.
Yeah, 100% the case for me. I sometimes use it to do adversarial reviews on code that Opus wrote but the stuff it comes back with is total garbage more often than not. It just fabricates reasons as to why the code it's reviewing needs improvement.
I feel like a licencing process for software engineers would
A) test lots of skills that are common but not universal. I'm thinking javascript trivia here, where I don't write any javascript in my professional capacity as a software engineer; but there are many people who think Software Engineer == Javascript Programmer
B) shine too much of a light on the fact that this industry is full of people who demand high salaries but can't program their way out of a paper bag
First day of my college freshman year, I purchased a Targus backpack for the heavy ass gaming laptop I had back then. I still use it decades later. It has carried stuff inside it for tens of thousands of miles, seen lots of abuse, weathered all sorts of conditions, and is still in really good shape. Not a single tear. Every compartment, every feature works the way it did on day of purchase. I'm honestly amazed every time I use it.
Pretty much the same story here. I'm on my 2nd Targus TXL617 (holds 17" laptop) in 18(?) years and I use it for everything every day. I've even used it for a short 3-day international business trip. I've used it at a dusty shooting range in the desert. It's been dropped, dragged, kicked, bitten (friend's dog) etc. and it still holds up.
I think Targus stopped making this model (maybe a trend towards smaller laptops), hopefully this one will keep working for a while before I need to find a good replacement.
What if its just bad use? I have some nice Targus model that I actually bought a second one because the zippers were opening. To my surprise, in a few weeks I got my first instance of zippers opening. Of course, I was overstuffing a bit at the top. I remembered that I had removed some guards that transform this model from more like a regular backpack to a clamshell. Maybe that is a very important feature I have to use, or the zippers will fail, when you carry a basketball there...
What it means is that it is easy to shit on other people's work. Much harder to give constructive criticism - especially on what looks like a throwaway account.
It’s not “other people’s work” because Steve didn’t do any work. He vibe coded hundreds of thousands of lines that don’t do what they’re supposed to with many thousands of lines of documentation that are inaccurate at best and aspirational at worst. He wrote some blog posts and got them picked up by vapid outlets that had nothing else to add to boost his exposure.
Case in point: no one talks about beads or gastown on HN because it’s crap that no one uses. Even *claw and that dumb fad get more mileage. meanwhile, CC vs Codex is an ever ongoing battle and Anthropic employees announce policy changes in “Tell HN” posts which stay on the front page for days.
>> Imagine a co-worker who generated reams of code with security hazards, forcing you to review every line with a fine-toothed comb. One who enthusiastically agreed with your suggestions, then did the exact opposite. A colleague who sabotaged your work, deleted your home directory, and then issued a detailed, polite apology for it. One who promised over and over again that they had delivered key objectives when they had, in fact, done nothing useful. An intern who cheerfully agreed to run the tests before committing, then kept committing failing garbage anyway. A senior engineer who quietly deleted the test suite, then happily reported that all tests passed.
>> You would fire these people, right?
Okay, now imagine a different colleague. One who writes a solid first draft of any boilerplate task in seconds, freeing you to focus on architecture instead of plumbing. A dev who never gets defensive when you rewrite their code, never pushes back out of ego, and never says "that's not my job." A pair programmer who's available at 3 AM on a Sunday when prod is down and you need to think out loud. One who remembers every API you've forgotten, every flag in every CLI tool, every syntax quirk in a language you use twice a year, or even every day.
You'd want that person on your team, right? In fact, you would probably give them a promotion.
Here's the thing: the original argument describes real failure modes, but then commits a subtle sleight of hand. It personifies the tool as a colleague with agency, then condemns it for lacking the judgment that agency implies. But you don't fire a table saw because it doesn't know when to stop cutting, right? You learn where to put your hands.
Every flaw in that list is, at the end of the day, a flaw in the workflow, not the tool. Code with security hazards? That's what reviews are for. And AI-generated code gets reviewed at far higher rates than the human code people have been quietly rubber-stamping for decades. Commits failing tests? Then your CI pipeline should be the gate, not a promise. Deleted your home directory? Then it shouldn't have had the permissions to do that in the first place. In fact, the whole "deleted my home directory" shit is the same thing as "our intern deleted the prod database". We all know that the response to the latter is "why did they have permission to prod in the first place??" AI is the same way, but for some god damn reason people apply totally different standards to it.
> But you don't fire a table saw because it doesn't know when to stop cutting, right?
If I purchased a table saw and that table saw irregularly and unpredictably jumped past its safeties -as we've plenty of evidence that LLMs [0] do-, then I would [1] immediately stop using that saw, return it for a refund, alert the store that they're selling wildly unsafe equipment, and the relevant regulators that a manufacturer is producing and selling wildly unsafe equipment.
[0] ...whether "agentic" or not...
[1] ...after discovering that yes, this is not a defective unit, but this model of saw working as designed...
> But that's the thing: the table saw has safeties. Someone put them there.
You noticed that I mentioned that this hypothetical table saw has poorly-designed, entirely inadequate safeties? Things like Opus treating the data it presents to the user as commands that it should execute [0] is definitely [1] a sign of solid, well-designed safety mechanisms.
You might choose to retort "Well, that's because the user isn't running the tool in the mode that makes it wait for confirmation before doing anything of consequence!". In reply, I would point in the general direction of the half-squillion studies indicating that a system whose safety requires an operator to remain vigilant when presented with a large volume of irregularly-presented decision points (nearly all of which can be safely answered with a "Yes, do it.") does not make for a safe system. [2] It -in fact- makes for a system that's designed [3] to be unsafe.
You might also choose to retort "That's never happened to me, or anyone that I know about.". Intermittent failures of built-in safeties that happen under unpredictable circumstances are far, far worse than predictable failures that happen under known ones. I hope you understand why.
[2] I would also -somewhat wryly- note that "An AI Agent that does all of your scutwork, but whose every decision you have to carefully scrutinize, because it will irregularly plan to do something irreversibly destructive to something you care about." is not at all the picture that "AI" boosters paint of these tools.
Just to drive home the "These things have poorly-designed, entirely inadequate safeties", here [0] is a report from three weeks ago of the then-latest version of Claude Code being commanded to enter into the "Don't modify anything" mode, reporting to the user that it was in the "Don't modify anything" mode, and then proceeding to modify things as if it was not actually in the "Don't modify anything" mode.
I'm sure if I dug around, I would find hundreds of reports of these tools [1] jumping over their safeties to do things that are unexpected, and not-infrequently hazardous. I expect that such reports will continue, because "building robust, effective, and reliable safeties" has very, very clearly not been a significant priority for the major LLM companies. But, I've more than proven my point, so I'll leave the small pile of evidence at this.
The biggest challenge for us are PRs that need to be coordinated across multiple repos. API + client for example. It doesn't sound like stacked PRs solve that problem, right? Description specifically states single repo.
reply