More

evanmoran · 2026-04-07T22:46:25 1775601985

We shall call it Achilles, as Claude Mythos is its only weakness.

evanmoran · 2026-04-01T17:36:31 1775064991

It also depends on if the CVEs can be fixed by LLMs too. If they can find and fix them, then it's very good.

cogman10 · 2026-04-01T17:46:46 1775065606

Fixing isn't often a problem for CVEs. The hard part is almost always finding the CVE in the first place.

There are some extreme cases that might require extensive code changes, and those would benefit from LLMs. But a lot of the issues are things like off by one issues with pointers.

wepple · 2026-04-02T00:48:51 1775090931

Fixing is now the bottleneck.

Most patches are non-trivial and then each project/maintainer has a preferred coding style, and they’re being inundated with PRs already, and don’t take kindly to slop.

LLMs can find the CVE fully zero interaction, so it scales trivially.

evanmoran · 2026-03-29T20:15:47 1774815347

Exactly. C++ is still waiting for its "uv" moment, so until then modules aren't even close to solved.

lostdog · 2026-03-29T21:59:38 1774821578

And uv required some ground work, where the PEP process streamlined how you define a python project, and then uv could be built on top.

evanmoran · 2026-03-27T19:26:14 1774639574

I suspect Capybara was the internal code name for what is now named Mythos

evanmoran · 2026-03-22T23:50:36 1774223436

I’m writing a new type of CRDT that supports move/reorder/remove ops within a tree structure without tombstones. Claude Code is great at writing some of the code but it keeps adding tombstones back to my remove ops because “research requires tombstones for correctness”.

This is true for a usual approach, but the whole reason I’m writing the CRDT is to avoid these tombstones! Anyway, a long story short, I did eventually convince Claude I was right, but to do it I basically had to write a structural proof to show clear ordering and forward progression in all cases. And even then compaction tends to reset it. There are a lot of subtleties these systems don’t quite have yet.

GermanJablo · 2026-03-23T09:30:31 1774258231

Interesting. I'm the author of DocNode, a library that does exactly what you're describing; it might be useful. https://docukit.dev

Cheers!

isaachinman · 2026-03-23T00:15:42 1774224942

I would strongly advise using Codex for a project like that

TheTaytay · 2026-03-23T03:59:20 1774238360

Please do elaborate. I’ve only tried switching to codex once or twice, and it’s been probably 3 months since I last tried it, but I was underwhelmed each time. Is it better on novel things in your experience?

davidanekstein · 2026-03-23T04:25:55 1774239955

My experience is that it is much more terse and realistic with its feedback, and more thoughtful generally. I trust its positive acknowledgements of my work more than claude, whose praise I have been trained to be extremely skeptical of.

Tallain · 2026-03-23T05:36:39 1774244199

In my experience, Codex / ChatGPT are better at telling you where you're wrong, where your assumptions are incomplete, etc., and better at following the system prompts.

But more importantly, as a coding agent, it follows instructions much better. I've frequently had Claude go off and do things I've explicitly told it not to do, or write too much code that did wrong things, and it's more work to corral it than I want to spend.

Codex will follow instructions better. Currently, it writes code that I find a few notches above Claude, though I'm working with C# and SQL so YMMV; Claude is terrible at coming up with decent schema. When your instructions do leave some leeway, I find the "judgment" of Codex to be better than Claude. And one little thing I like a lot is that it can look at adjacent code in your project so it can try to write idiomatically for your project/team. I haven't seen Claude exhibit this behavior and it writes very middle-of-the-road in terms of style and behavior.

But when I use them I use them in a very targeted fashion. If I ask them to find and fix a bug, it's going to have as much or more detail as a full bug report in my own ticketing system. If it's new code, it comes with a very detailed and long spec for what is needed, what is explicitly not needed, the scope, the constraints, what output is expected, etc., like it's a wiki page or epic for another real developer to work from. I don't do vague prompts or "agentic" workflow stuff.

logicchains · 2026-03-23T18:24:13 1774290253

GPT is much better at anything mathematical than Claude, as is Gemini. This is evidenced by their superior results at math Olympiads, the Putnam, etc.

linolevan · 2026-03-23T01:28:54 1774229334

How much is OpenAI paying you for this

isaachinman · 2026-03-23T10:56:05 1774263365

Absolutely nothing. I have active subscriptions for both. Claude is better at FE stuff. Codex is better at actual programming.

steve_adams_86 · 2026-03-23T20:34:42 1774298082

How is FE not actual programming? I spend less time on FE than I once did, but it has presented some of the most interesting programming challenges I've encountered in my career. It's a large technical space, rich with 'actual' programming to be done.

evanmoran · 2026-02-03T16:02:35 1770134555

You should consider calling these "behaviors" to mimic behavior trees in game / robot AI. They follow the same notion of a single behavior being active at once: https://en.wikipedia.org/wiki/Behavior_tree_(artificial_inte...

evanmoran · 2026-01-16T00:01:17 1768521677

From what I've seen a ton of people are using Claude Code or Cursor daily. I wouldn't be surprised if most startups are at 100% use right now. The big tech companies are a bit slower, but have started rolling out almost unlimited token use so I wouldn't be surprised if they are above 50% adoption by the end of the year.

Start with Claude Code if you haven't tried it yet as it can edit your files directly and has some pretty fantastic skills/plugins that are quite interesting. (Copilot is quite a bit far behind unfortunately.)

evanmoran · 2026-01-09T18:57:13 1767985033

This is great! Please consider even easier settings for kids. Maybe 2 notes (not 4) as the min starting point and slower ramp up as you succeed.

Also, I think the missed the first note UI could feel a little nicer. Something about the popup/hiding the music takes you out of the flow. Possibly just a subtitle would be enough with an encouraging message. There is a big difference between failing to do the whole pattern and failing the first note so definitely worth refining the feedback here.

vunderba · 2026-01-09T20:04:16 1767989056

This is good feedback - I'll add it in the next version! I actually wrote this for my younger sister but she already has some musical training so it isn't as geared towards beginners.

marcusverus · 2026-01-09T19:17:08 1767986228

As a non-musical person, I went looking for how to set it to 1 note!

evanmoran · 2025-12-30T16:40:16 1767112816

In a similar way I changed all of my build and deployment scripts to Go not long ago. The actual benefit was utility functions used by the service could be shared in deployment. So I could easily share code to determine if services/dbs were online or to access cloud secrets in a uniform way. It also improved all the error checks to be much clearer (did the curl fail because it’s offline or malformed).

Additionally, it is even more powerful when used with go modules. Make every script call a single function in the shared “scripts” module and they will all be callable from anywhere symmetrically. This will ensure all scripts build even if they aren’t run all the time. It also means any script can call scripts.DeployService(…) and they don’t care what dir they are in, or who calls it. The arguments make it clear what paths/configuration is needed for each script.

evanmoran · 2025-12-16T20:41:45 1765917705

GitHub actions are expensive enough that self-hosted was the only real option. I can't imagine this will do anything other than push people from the entire ecosystem.

HN For You