For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | chandureddyvari's commentsregister

I’m currently cobbling sub agents with hooks, workflows looks very promising for doing things more predictably.

Is this equivalent of DAGs for sub agents inside claude code? Can i pause and resume/retry workflows? How stateful are they?

Really appreciate it someone claude code can throw more light on above. I’m trying to see if I can get langgraph equivalent DAGs here.


Wasn't Mythos a step change improvement?

What’s your favourite harness? Is there any benchmarks for harness like LLMs have for swe verified?


There Seen to be more and more harness benchmarks out there, pretty interesting read:

https://neuralnoise.com/2026/harness-bench-wip/


You can check my profile for which one I like most :) I do think there have been efforts to benchmark different harnesses.

Personally I'm not going to choose one harness or another based on +/- a few percentage points in a benchmark. I'm going to use one the one that I find the most ergonomic, that isn't too bloated, etc. The models are the primary lever, not the harness.


I was to talking to a YC founder, his biggest fear is waking up to a new Claude launch making his startup obsolete the next morning.

Similar sentiment shared with other startup founders- check on x about all VCs talking about moats against big labs.


or the word 'canonical'


Or they're a prolog programmer.


For a long time I wondered how SV startups got such pretty landing pages (here’s a comment I left 2 years back: https://news.ycombinator.com/item?id=37421273). I wanted one for my side projects but couldn’t afford an agency, and the templates online were boring. Creating the page was only half the problem. I also needed somewhere to collect emails for the waitlist.

After AI happened, I built an app (promptfunnels) to scratch my own itch and generate funnels (fancy name for landing pages with a purpose).

Then came the harder part: marketing it. Coming from a tech background, I knew nothing about marketing, so I started reading and came across the $100M Leads book. I realized codifying those principles together with funnels and marketing automation had a real market. My family, friends, and acquaintances became the first customers. A friend joined me as cofounder and we both quit our jobs to do this full time.

As we talked to other startup founders, they kept describing a tangential problem they called GTM. At the core it was the same thing we were solving: marketing for non-marketers. So we pivoted to RevMozi(https://revmozi.com/), which helps non-marketers do both inbound and outbound GTM.

We’re dogfooding the product and coming out of beta next month.

Wish us luck.


Your pricing and "what it does" navbar links aren't working


> how SV startups got such pretty landing pages

Umm where? They are indistinguishable from each other. Not pretty.


some of them are non existent today. Check the parent thread - some good recommendations(for 2023) on both functional websites and pretty websites. At that time if I recall linear landing page was all rage, and there were many copycats.


I had good success with hooks in claude code. Personally I feel this problem was common with humans as well. We added tools like husky for git commits, for our peers to push code which was linted, type checked etc.

I feel hooks are integral part of your code harness, that’s only deterministic way to control coding agents.


I fully agree. Also started using husky before expanding further and created my own hooks. I can’t imagine myself using agents today without them, it would require a lot of babysitting.


Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect. Then 30 minutes later I hit session limits. Three sessions like that in a day, and suddenly 25% of the weekly limit is gone.

I ended up buying the $100 Codex plan. So far it has been much more generous with usage and more accurate than Claude for the kind of work I do.

That said, Codex has its own issues. Its personality can be a bit off-putting for my taste. I had to add extra instructions in Agents.md just to make it less snarky. I was annoyed enough that I explicitly told it not to use the word “canonical.”

On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code. Claude used to have much better finesse there. But for backend logic, hard debugging, and complex problem-solving, Codex has been clearly better for me. These days I use Impeccable Skillset inside Codex to compensate for the weaker UI taste, but it still does not quite match the polish and instinct Claude Code used to have.

I used to be a huge Claude Code advocate. At this point, I cannot recommend it in good conscience.

My advice now is simple: try the $20 plans for Codex and Cursor, and see which one matches your workflow and vibes best


I had a weird experience at work last week where Claude was just thinking forever about tasks and not actually doing anything. It was unusable. The next day it was fine again.


That happens to me all the time. My current working theory is when their servers are hammered there is a queueing system that invisible to end-users.


The way Claude/Codex behave is entirely consistent with how every vibe coded project (of mine) has ended up so far. I bet those guys have no idea what's going on and are taking guesses because no one understands the thing they've made.


i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it.

I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.

So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency.


Ya I've had this experience more than a few times recently. I've heard people claiming they are serving quantized models during high loads, but it happens in cursor as well so I don't think it's specific to Anthropics subscription. It could be that the context window has just gotten into a state that confuses the model... But that wouldn't explain why it appears to be temporary...

My best guess is this is the result of the companies running "experiments" to test changes. Or it's just all in my head :)


Cursor one is back to Claude 4 or 3.5+ at best. Struggles to do things it did effortlessly a few weeks ago.

It’s not under load either it’s just fully downgraded. Feels more they’re dialing in what they can get away with but are pushing it very far.


These days cursor feel more capable and reliable then Claude Code (at last for my workflow). For personal projects, I'm using cursor during planning and verification but run Claude code for just implementation to save $.


Set MAX_THINKING_TOKENS to 0, Claude's thinking hardly does anything and just wastes tokens. It actually often performs worse than without thinking.


Not the guy you're responding to, but when this happens the token counter is frozen at some low value (eg. 1k-10k) value as well, so it's not thinking in circles but rather not thinking (or doing anything, for that matter) at all.


i was having this issue yesterday. the same prompt would send it into a loop where it would appear to be doing nothing for 30+ minutes until i cancelled it. it would show 400 tokens used and thats it. I tested on a previous version (2.1.68) and it still ran into this neverending loop BUT at least the token count kept steadily increasing.

So we are seeing 1. some sort of model degredation is my guess (why it can't break a thinking loop on some problems), as well as 2. a clear drop in thinking token UI transparency

when i left it running overnight it finally sent a message saying it exceeded the 64000 output token limit


This exact thing is happening to me since yesterday. It comes back to life when I throw the whole session away.


This happened to me as well! It was especially infuriating because I had just barely upgraded to the $200 per month plan because I exhausted my weekly quota. Then the entire next day was a complete bust because of this issue. I want my money back!


What day was it?


Thursday starting mid to late morning, and ended Friday night (US timezone).


Same day then. It was happening for me roughly between 9am-5pm BST time.


I'm using the Codex Business subscription (about 30€) already for multiple months. Even there they cut back on the quota. A few months back it was hard for me to reach the limit. Now it is easier.

Still, in comparison with Claude Code, the quota of Codex is a much better deal. However, they should not make it worse...


OpenAI had a promotion that gave everyone double their rate limits until April 2nd.


Promotion has been extended til May 31st for the $100 and $200 subs.

At the same time, they’ve been giving out a ton of additional quota resets seemingly every other week (and committed to an additional reset for every million additional users until they hit 10mil on codex).

So they’ve really set a high bar for people’s expectations on their quota limits.

Once they drop the 2x promotion for good and stop the frequent resets, there are going to be a lot of complaints.


I have the exact opposite experience. I can run claude forever, my codex quota was done by Wednesday morning.


> Claude has gotten noticeably worse for me too. It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.

This is what I'm working on proving now.

It is more that there is a confidence score while thinking. Opus will quit if it is too high and will grind on if the confidence score is close to the real answer. Haiku handles this well too.

If you give Sonnet a hard task, it won't quit when it should.

Nonetheless, that issue has been fixed with Opus.

I'll try to show that the speed of using Opus on tasks that have medium to hard difficultly is consistently the same price or cheaper than running them with Haiku and Sonnet. While easier tasks, the busy work that is known, is cheaper run with Haiku.


> This is what I'm working on proving now.

Stella Laurenzo, AMD’s director of AI, filed a detailed GitHub issue on April 2 documenting that Claude Code reads code three times less before editing it, rewrites entire files twice as often, and abandons tasks mid-way at rates that were previously zero. Her analysis of nearly 7,000 sessions puts precise numbers on how Anthropic’s coding tool has degraded since early March.

https://github.com/anthropics/claude-code/issues/42796


I’ve been wondering whether this is less a “model got worse” problem and more an constraints problem.

If the files and task boundary are already known, letting the system keep exploring/replanning inside one accumulating context seems like the wrong default. It makes the agent pay over and over to rediscover what it should already know, and every extra turn increases the chance it wanders into a different approach.


> Claude has gotten noticeably worse for me too.

My experience is limited only to CC, Gemini-cli, and Codex - not Aider yet, trying different combinations of different models.

But, from my experience, CC puts everything else to shame.

How does Cursor compare? Has anyone found an Aider combination that works as well?


Is aider even a thing considered anymore?

It was pretty much first for CLI agents and had a benchmark that was the go to at the start of LLM coding. Now the benchmark doesn't get updated and aider never gets a mention in talking about CLI tools till now.


Aider is dead because it's pre function calling era of tech


By the way, what are you using it for? I bought Max and Pro plans for Claue and Codex, developed a few apps with it, and after the initial excitation ("Wow I can get results 10x faster!") I felt the net sum is negative for me. In the end I didn't learn much except the current quirks of each model/tool, I didn't enjoy the whole process and the end result was not good enough for my standards. In the end I deleted all these projects and unsubscribed.


For me it’s mostly useful in day-to-day coding, not “build an entire app and walk away” coding.

TDD was never really my natural style, but LLMs are great at generating the obvious test cases quickly. That lets me spend more of my attention on the edge cases, the invariants, and the parts that actually need judgment.

Frontend is another area where they help a lot. It’s not my strongest side, so pairing an LLM with shadcn/ui gets me to a decent, responsive UI much faster than I would on my own. Same with deployment and infra glue work across Cloudflare, AWS, Hetzner, and similar platforms.

I’m basically a generalist with stronger instincts in backend work, data modeling, and system design. So the value for me is that I can lean into those strengths and use LLMs to cover more ground in the areas where I’m weaker.

That said, I do think this only works if you’re using them as leverage, not as a substitute for taste or judgment.


> It goes into long exploration loops for 5+ minutes even when I point it to the exact files to inspect.

Give it a custom sandbox and context for the work, so it has no opportunity to roam around when not required. AI agentic coding is hugely wasteful of context and tokens in general (compared to generic chat, which is how most people use AI), there's a whole lot of scope for improvement there.


But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.


> But the problem is it used to not need that before. These days, you have to think twice before you summon a subagent.

This is exactly what I (and many others) kept trying to tell the pro-AI folk 18 months ago: there is no value to jumping on the product early because any "experience" you have with it is easily gained by newcomers, and anything you learned can easily be swapped out from under you anyway.


The value is all the things I built with it? Surely, this constant change deteriorates the experience but to be clear, here we're nitpicking on the experience, not questioning the value.

I also don't understand the "pro-AI" phrase. It's a tool, it brings results. I'm not pro-car when I drive to work.


> The value is all the things I built with it?

To be clear, the people I were talking about were not referring to the value, but the experience in using these tools.

> I also don't understand the "pro-AI" phrase.

Would you prefer the phrase "AI-boosters"?


Ah, okay, I must have gotten lost in the conversation. Sorry!

> Would you prefer the phrase "AI-boosters"?

AI-booster folk? :)


> AI-booster folk? :)

Could be; I mean, we differentiate between people who use cars as a tool and call enthusiasts "petrol-heads".

I use AI daily, but I certainly wouldn't consider myself either pro-AI or an AI-booster.

(Naming is hard)


The sandbox is fine, but if the parent has given explicit instruction of files to inspect, why is it not centering there? Is the recent breakage that the base prompt makes it always try to explore for more context even if you try to focus it?


Because the "explicit instruction" you give AI is not deterministic as in a normal computer program. It's a complete black box and the context is also most likely polluted by all sorts of weird stuff. Putting it on as tight of a leash as possible should be seen as normal.


They changed plan mode so that it's instructed to follow a multi-step plan, the first step being to explore the code base. When you tell it to focus it's getting contradictory instructions from plan mode vs your prompt and it's essentially a coin flip which one it picks.

It does seem like a cynical attempt to make more money.


> the $100 Codex plan. So far it has been much more generous

Be aware Codex is currently doing a 2x usage promo. So 5x is actually 10x and 20x is actually 40x until the end of May.


When they bumped the context size up to 1m tokens they made it much easier to blow through session limits quickly unless you manually compact or keep sessions short.


I also gave up on my Claude Code subscription. It's running out in 2 weeks and I have canceled it. My current MAX session got rate-limited in 2 hours of work and that's just absurd.

Codex seems to give the $20 plan for free for 1 month and that's what I signed up for.

Let's see how it compares when I can't use my Claude max sub for 3 more hours.


Codex has been better for me, but it's WAY too nitpicky/defensive. It always wants to make changes that add complexity and code to solve a problem that's impossible to happen (e.g. a multiprocess race condition on a daemon I only ever run one instance of).


You just convinced me to try it. Claude just copy pastes, does search and replace, zero abstractions and I'm the one that needs to think about the edge cases.


You may think that's a good thing but it's not. Codex is great at coming up with solutions to problems that don't exist and failing to find solution to problems that do. In the end you have 300 new lines of code and nothing to show for it.


That's why I have Claude write the code and Codex review.


that’s like having oleg kiselyov’s code reviewed by my middle school daughter :)


I didn't know your middle school daughter is a genius coder, congratulations!


Same here; that’s very annoying because it adds a lot of entropy to the code, and people don’t always take the time to clean things up.


> On UI/UX taste, I still think current Codex is behind the Jan/Feb era of Claude Code.

OpenCode is great though, and can (for now) use an OpenAI subscription.


Any good reasonable alternatives? Gemini is like prodigious 3yo hopeless for my projects, anybody tested some opencode with kimi or something?


I'm adding two extra gpus to my local rig. Turns out qwen 3.5 122b is already enough to handle (finish with moderate guidance) non-planning parts of my tasks.


what kinda gpus are you using?


3090s


I am also on Codex while Claude seems to be blatantly ignoring instructions (as recently as Thursday: when I made the switch). The huge Claude context helps with planning, so that's all it does now.

Codex consumes way fewer resources and is much snappier.


I wonder if this is in the system prompt: "Go round in circles to make us more money."


The product was performing badly and you thought this would be solved by spending more money on it?

When will people realize this is the same as vendor lock-in?

"Maybe if I spend more money on the max plan it will be better" > no it will be the same "Maybe if I change my prompt it will work" > no it will be the same "Maybe if I try it via this API instead of that API it will improve" > no it will be the same.

Claude, ChatGPT, Gemini etc all of these SOTA models are carefully trained, with platforms carefully designed to get you to pay more for "better" output, or try different things instead of using a different product.

It's to keep you in the ecosystem and keep you exploring. There is a reason you can't see the layers upon layers of scaffolding they have. And there's a reason why after 2 weeks post major update, the model is suddenly "bad" and "frustrating". It's the same reason its done with A/B testing, so when you complain, someone else has no issues, when they complain, you have no issues. It muddies the water intentionally.

None of it is because you're doing anything wrong, it's not a skill issue, it's a careful strategy to extract as much engagement and money from customers as possible. It's the same reason they give people who buy new gun skins in call of duty easier matches in matchmaking for the first couple games.

The only mistake you made was paying MORE, hoping it would get better. It won't, that's not what makes them money. Making people angry and making people waste their time, while others have no issues, and making them explore and try different things for longer so they can show to investors how long people use these AI tools is what makes them money.

When competitors have a better product these issues go away When a new model is released these issues don't exist

I was paying a ton of money for claude, once I stopped and cancelled my subscription entirely, suddenly sonnet 4.6 is performing like opus and I don't have prompts using 10% of my quota in one message despite being the same complexity.


Do you realize Claude and Codex are different products by different companies?


You ask that as if there is some insight to the question, but the insight is hard to find. What the person you replied to is saying, applies to both Claude and Codex.


Maybe I’m in the minority here, but while directories and similar channels are useful, I felt like I was just shooting darts in the dark without understanding sales and marketing from first principles and hoping something would stick.

I had three side projects and kept struggling to get any real traction or traffic without becoming spammy across the internet. So I decided to approach it the same way I approach learning anything new: through books, courses, and solid foundational material.

HN had a few excellent suggestions. One of them was Founding Sales. Another, which I came across through a friend’s recommendation, was Alex Hormozi’s series. He seems to have something of a cult following, which made me a bit skeptical at first, so I decided to just read the first 100 pages before forming an opinion.

I ended up finding it genuinely useful, especially for understanding the psychology and mindset needed to sell something. I now highly recommend his book $100M Leads to technical friends who are trying to figure out how to sell what they’ve built.

I’m still learning, if you’ve any good recommendations, please drop them below


maybe you could bring 3 bullet points what were the most useful insights from that book?


Agreed, had the same experience. Codex feels lazy - I have to explicitly tell it to research existing code before it stops giving hand-wavy answers. Doc lookup is particularly bad; I even gave it access to a Context7 MCP server for documentation and it barely made a difference. The personality also feels off-putting, even after tweaking the experimental flag settings to make it friendlier.

For people suggesting it’s a skill issue: I’ve been using Claude Code for the past 6 months and I genuinely want to make Codex work - it was highly recommended by peers and friends. I’ve tried different model settings, explicitly instructed it to plan first and only execute after my approval, tested it on both Python and TypeScript backend codebases. Results are consistently underwhelming compared to Claude Code.

Claude Code just works for me out of the box. My default workflow is plan mode - a few iterations to nail the approach, then Claude one-shots the implementation after I approve. Haven’t been able to replicate anything close to that with Codex


+1 to this. Been using Codex the last few months, and this morning I asked it to plan a change. It gave me generic instructions like 'Check if you're using X' or 'Determine if logic is doing Y' - I was like WTF.


Curious, are you doing the same planning with Codex out-of-band or otherwise? In order to have the same measurable outcome you'd need to perhaps use Codex in a plan state (there's experimental settings - not recommended) or other means (explicit detailed -reusable- prompt for planning a change). It's a missing feature if your preference is planning in CLI (I do not prefer this).

You are correct in that this mode isn't "out of the box" as it is with Claude (but I don't use it in Claude either).

My preference is to have smart models generate a plan with provided source. I wrote (with AI) a simple python tool that'll filter a codebase and let me select all files or just a subset. I then attach that as context and have a smart model with large context (usually Opus, GPT-5.2, and Gemini 3 Pro in parallel), give me their version of a plan. I then take the best parts of each plan, slap it into a single markdown and have Codex execute in a phased manner. I usually specify that the plan should be phased.

I prefer out-of-CLI planning because frankly it doesn't matter how good Codex or Claude Code dive in, they always miss something unless they read every single file and config. And if they do that, they tip over. Doing it out of band with specialized tools, I can ensure they give me a high quality plan that aligns with the code and expectations, in a single shot (much faster).

Then Claude/Codex/Gemini implement the phased plan - either all at once - or stepwise with me testing the app at each stage.

But yeah, it's not a skill issue on your part if you're used to Plan -> Implement within Claude Code. The Experimental /collab feature does this but it's not supported and more experimental than even the experimental settings.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You