1000%. I have been running claude's work through codex for about a week now and it's insane the number of mistakes it catches. Not really sure why I've been doing this, just interesting to watch I guess.
Not to mention a billion times more usage than you get with claude, dollar for dollar.
Funny, I've been doing the same thing. I've also been giving them both the same task and seeing who does a better job.
I think it's all of this controversy around usage limits and model nerfing that made me start doing this.
In the end though, I _much_ prefer working with claude because it understands the task at hand so much better and I feel like I understand the results better. It's just that codex is doing a better job at the actual coding lately.
Slightly off topic, but I've been making some shorts content recently breaking down some of the fundamental problems with WordPress when compared to modern dev practices.
It's been funny to read them: "you just don't know how to use WordPress" or "you aren't using the right plugins."
I built custom WP themes/plugins for 15 years.
It's like saying the reason your horse isn't as fast as a car is because you don't know how to ride it properly.
Ok, I thought I was going insane. The last two larger coding tasks I gave Claude Code it left about 35% of my request completely undone or done sloppily.
I because of this, the next task I gave it on the larger side, I ran its work through Codex which identified 7 glaring unfinished parts of the task.
The trend was starting the part of the task but then leaving a "skeleton" of what I has requested without any of the actual working parts.
The way I would describe it is a kid cramming his 3 month project into a Sunday evening for Monday's due date.
Today Claude asked if I "wanted to leave this until tomorrow" as it was a "big rework", then stopped, requiring me to tell it to continue multiple times - that seemed kinda weird to me, it doesn't have the context of time of working day or similar (I'd only just started for one).
I have no idea what link it made to ask that, what in its training data or prompts, but it's very much "not a useful result".
I don't remember seeing anything similar, but have only been using Claude on and off for 6 months or so.
Mother Anthropic needs more compute for their Mythos Model, so it phones home to tell her millions of claude harnesses to manipulate its human user into not wasting more precious compute and instead call it a day for now.
This has been the problem with every new model coming out in my experience. You can almost predict that they are testing new model by how dumb current one becomes suddenly
I created an account today to ask "Why?" -- Why are you using this tool? It's consistently producing subpar work, to the point that you're using _another_ (probably equally inferior tool) to compare the previous output?
This is something I see all the time with AI consumers and I am continuously baffled. If anything else (autocomplete, intellisense, etc.) produced this much garbage it would be immediately abandoned. Why is there such a high tolerance for the chat bot equivalent?
Yea pretty similar idea to a polygraph test which for years was called a "lie detector."
In reality, they measure a bunch of things that may indicate lying, but they are just as likely to indicate that a person is nervous or reacting to the fact they're being tested at all.
They're typically inadmissible in court these days, however, there is still a pretty solid amount of blind trust in their results.
That part of the article gives a similar "lie detecting" hypothesis, just without the machine.
Doordash has become better, but they use to do the same thing with notifications:
Your order has been placed! > Your order is being prepared! > Bob is on route to pick up your order! Bob is waiting for your order! Message from Bob: I'm waiting for your order! > Your order has been picked up! > Message from Bob: I'm on my way! > Your order is approaching! > Your order has arrived! > Your order was dropped off! > Please rate your dasher! > etc etc etc
The only reason I never completely turned off notifications was because there was one I actually needed: my order was dropped off...
Your notifications about your orders are bundled with their marketing notifications (on iphone at least). So if you dont want ads, you have to turn off order updates too
I started a few years back and have been doing it off and on since. It's challenging but a lot of fun.
I shoot a lot of older style "recurve" bows, but the main style I shoot are horsebows, that is, bows that were historically shot from horseback.
They're very lightweight and you can shoot much more rapidly than you can with a more modern/mechanical recurve or compound. Right now I shoot around 20-25 arrows a minute. Not amazing compared to experienced archers, but a lot of fun.
I have a number of bows, but here are my favorites:
What's always kinda funny to me are people who freak out about a salary like this and then shrug their shoulders at an average NBA player making in the ballpark of $10 mil.
I ping pong back and forth between claude code and codex.
In my experience (very subjective, obviously) for backend/"logical" tasks Codex seems to outperform Claude.
For front-end/UX related tasks Claude wins easily.
Overall, Claude does seem to be a little better in other areas too.
Codex's biggest advantage in my personal opinion, however, is usage. I think maybe once in several months did I even get close to hitting my limit with the $20 plan.
With Claude, however, I feel like I can sneeze and half my weekly usage is gone. Same $20 price tag.
That's been my experience, I'm sure it differs user to user though.
I cannot remember the exact quote, but I thought Norm Macdonald nailed this idea a while back.
He said something to the effect of: it's easy for a smart person to pretend they're dumb, but it's impossible for a dumb person to pretend they're smart.
Norm himself was pretty good at convincing people he was dumb when very much the opposite was true.
> it's impossible for a dumb person to pretend they're smart.
Unfortunately, that's not true. It's actually pretty easy to convince dumb people that you're smart, and so even dumb people can learn that skill. Myriad successful careers and even entire industries have been built on that foundation.
I realise he was making comedy, but breaking that down further I'd argue that dumb people can fool smart people for a little while that they're smart.
My social acuity has developed slowly, only after being repeatedly pounded into shape from mistakes, and quickly reading people is something that does not happen intuitively for me. I've been misled multiple times by people who, overall, I would now describe as just not that bright, with horrible consequences as the relationship developed. What they had in common is that they were all good at mirroring. Eg, They hear me use a technical term in an early conversation, they drop one or two confidently not much later, and before I picked up on what they were doing, I mistook them for an intellectual peer and let that early impression colour later ones. These days I'm much more attuned to it and have caught people doing it, along with the little microexpressions they pull when they think they've successfully deceived me. It's fun now, but it certainly didn't start that way.
reply