For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | Paracompact's commentsregister

As most others have pointed out, the goal from here wouldn't be to craft a custom harness so that Claude could technically fly a plane 100x worse than specialist autopilots. Instead, what would be more interesting is if Claude's executive control, response latency, and visual processing capabilities were improved in a task-agnostic way so that as an emergent property Claude became able to fly a plane.

It would still be better just to let autopilots do the work, because the point of the exercise isn't improved avionics. But it would be an honestly posed challenge for LLMs.


I'm curious about your learning experience, but what was the nature of your bottleneck, exactly? Was the backend perfectly fine as a backend, but Claude struggled to wire it to a frontend gracefully?

Claude does a great job generating the code. The hard part was the UX like if the app gets complex, then I want a new feature which adds more complexity on top; because of the way the application/UX is designed, it's hard to integrate that feature in a way that's not confusing to the user.

Like for example, I used check boxes to mean "include the records in the result set" but in a different section later in the flow, I have a different but similar looking view/list of records but I just want to use the check boxes to do batch delete but don't want the user to think that this means "include in the result set" in this case. So maybe instead I need a different single checkbox at the top which says "Don't ask for confirmation" so the user can just click on the normal "delete icon" on each row to delete the entries quickly without being prompted... But on the other previous view/list I allow the user to use the check boxes to both include the record but also batch delete using a single small cross at the top... But in the later section I mentioned, I don't want to do this because of the way I designed the flow, it would confuse the user and make it hard for them to track what they're doing and where they made the change (I want the selection step to be in a single place, the current page serves a different purpose). So maybe I need to change the other page as well for consistency... And use the "Don't ask for delete confirmation" approach everywhere? But there's not enough space to fit that text on those other pages...

When you solve all the hard problems, this is what coding gets reduced to. Not a hard problem but it's like lots of small ones like that which keep coming up and your interface ends up with a complex URL scheme and lot of modals and nested tabs.


Formal methods practitioner here.

> That means proving the absence of bugs, and you cannot prove a negative. The best thing you can do is fail to find a bug, but that doesn't mean it isn't there.

You can conclusively (up my next point) prove a specific bug or class of bugs aren't there. But "entirely free of (all) bugs" is indeed a big misconception for what formal methods does.

> how do you know your formal verification is bug-free? Answer: you don't. Or if you try to formally verify your formal verification then you're just translating the problem to a new layer. It's just a chain of proofs that is always ultimately based on an unproven one, which invalidates the whole chain.

It's another misconception of formal methods to say that any result is established conclusively, without any caveats whatsoever. But then again neither is mathematics, or any other intellectual discipline. What formal methods does is reduce the surface area where mistakes could reasonably be expected to reside. Trusting the Rocq kernel, or a highly scrutinized model of computation and language semantics, is much easier than trusting the totality of random unannotated code residing in the foggiest depths of your average C compiler, for instance.


> I work with the agent to create plans first and refine them, and the author of a PR who couldn’t do that adds nothing.

As someone who has been using AI extensively lately, this is my preferred way of doing serious projects with them:

Let them create the plan, help them refine it, let them rip; then scrutinize their diffs, fight back on the parts I don't like or don't trust; rinse and repeat until commit.

Yet I assume this would still be unacceptable to most anti-AI projects, because 90%+ of the committed code was "written by the AI."

> why would I want to go back and forth with an LLM through PR comments when I could just talk to the agent myself in real time?

Presumably for the same reason you go back and forth with humans through PR comments even when you could just code it yourself in real time. That reason being, the individual on the other end of the PR should be saving you time. It's still hard work contributing quality MRs, even with AI.


I don’t have a problem working with contributors who use AI like you described. But this thread is about working with people who could not do the work on their own. So they cannot do what you described, and they cannot save me any time, they can only waste it.

Fair enough, that makes sense. I wish more (on both sides of the aisle) were open-minded to the difference.

https://www.un.org/en/about-us/universal-declaration-of-huma...

"Everyone has the right to leave any country, including his own, and to return to his country."

Passports shouldn't be a privilege, and violation of the universal declaration of human rights is pretty serious.


> violation of the universal declaration of human rights is pretty serious

No, it isn’t. It isn’t even legally binding. (And obviously couldn’t be. It would invalidate incarceration and conscription, for example.)


People don't have that right when they are criminals, prisoners don't get to leave their country etc and this is essentially that.

Indeed, it seems to occupy a middle ground between fast-and-easy AI prompting, and slow-but-robust traditional programming. But it's still relying on AI outputs unconstrained (as far as I can tell) by more formal methods and semantic checks.

But it's also hard for me to grasp the exact value add from the README, or why I should buy their story, so I'm not sure.


Install my executable bro, trust me just one more tool and you will be the 10x engineer!!!


Name and shame?


Apologies to the user if they aren't an alt and this is just their genuine first post, but: We really ought to delegitimize the whole "creating alts to post unpopular opinions" trend. I know HN is very privacy-centric, but it just seems wrong to me when someone isn't willing to stake their reputation on (what I charitably assume are) genuinely held controversial opinions.


That's kind of the point - encouraging people to post genuine thoughts that they otherwise wouldn't because of the fear that they might suffer for it later. Sure, it doesn't always encourage the most well thought-out comments, but raising the bar for new commenters with no/low reputation is how you end up with echo chambers.


I think the policy is there for a reason. I think it encourages more free discussion. I don't know if you are upset that there is a way for people to express thoughtful ideas without being bullied for doing wrongthink or havingt he wrong opinions.


I mean I assume it's a bot comment. It has that LLM stink to it in writing style even without the "new account" signal.


A bit uncharitable. I'm a diehard fan of Rocq, but it's nothing unusual to see the young new hotness that is Lean continue to get the spotlight. It's not a sign of Microsoft putting its thumb on the scales, and the hype for Lean has long predated LLMs.

It's certainly less mature when it comes to verified programming, but its appeal to mathematicians (rather than formal methods experts) has earned it much respect.


I don't think he's referring to Lean specifically, but any sort of executable testing methodology. It removes the human in the loop in the confidence assurance story, or at least greatly reduces their labor. You cannot ever get such assurance just by saying, "Well this model seems really smart to me!" At best, you would wind up with AI-Jim.

(One way Lean or Rocq could help you directly, though, would be if you coded your program in it and then compiled it to C via their built-in support for it. Such is very difficult at the moment, however, and in the industry is mostly reserved for low-level, high-consequence systems.)


>Such is very difficult at the moment

What do you mean? It's a nice and simple language. Way easier to get started than OCaml or Haskell for example. And LLMs write programs in Lean4 with ease as well. Only issue is that there are not as many libraries (for software, for math proofs there is plenty).

But for example I worked with Claude Code and implemented a shell + most of unix coreutils in like a couple of hours. Claude did some simple proofs as well, but that part is obvs harder. But when the program is already in Lean4, you can start moving up the verification ladder up piece by piece.


Well, if you do not need to care about performance everything can be extremely simple indeed. Let me show you some data structure in coq/rocq while switching off notations and diplaying low level content.

Require Import String.

Definition hello: string := "Hello world!".

Print hello.

hello = String (Ascii.Ascii false false false true false false true false) (String (Ascii.Ascii true false true false false true true false) (String (Ascii.Ascii false false true true false true true false) (String (Ascii.Ascii false false true true false true true false) (String (Ascii.Ascii true true true true false true true false) (String (Ascii.Ascii false false false false false true false false) (String (Ascii.Ascii true true true false true true true false) (String (Ascii.Ascii true true true true false true true false) (String (Ascii.Ascii false true false false true true true false) (String (Ascii.Ascii false false true true false true true false) (String (Ascii.Ascii false false true false false true true false) (String (Ascii.Ascii true false false false false true false false) EmptyString))))))))))) : string


You know you could just define the verified specs in lean and if performance is a problem, use the lean spec to extract an interface and tests for a more performant language like rust. You could at least in theory use Lean as an orchestrator of verified interfaces.


In Lean, strings are packed arrays of bytes, encoded as UTF-8. Lean is very careful about performance; after all, a self-hosted system that can't generate fast code would not scale.


But isn't that tantamount with "his comment is a complete non-sequitor"?


I don't think so? Lean is formal methods, so it makes sense to discuss the boons of formal and semiformal methods more generally.

I used to think that the only way we would be able to trust AI output would be by leaning heavily into proof-carrying code, but I've come to appreciate the other approaches as well.


But that's exactly my point. "It's natural to discuss the broader category" is doing a lot of heavy lifting here. The blog post is making a very specific claim: that formal proof, checked by Lean's kernel, is qualitatively different from testing, it lets you skip the human review loop entirely. cadamsdotcom's comment rounds that down to "executable specs good, markdown specs bad," which... sure, but that's been the TDD elevator pitch for 20 years.

If someone posted a breakthrough in cryptographic verification and the top comment was "yeah, unit tests are great," we'd all recognize that as missing the point. I don't think it's unrelated, I think it's almost related, which is worse, because it pattern-matches onto agreement while losing the actual insight.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You