For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | mohsen1's commentsregister

I haven't time to do it but can someone try to unminify the newer version based on the minified new version + the source of previous version? There's gotta be a way to do this

They have terms to not allow `claude -p` used like that. However, people can hide this with the leaked source code. What a funny cat-and-mouse game!

> They have terms to not allow `claude -p` used like that.

Like what? I legitimately don't understand what is prohibited. Using claude as part of a shell script? Am I only allowed to use claude if a physically type the commands into a terminal via my keyboard? Why even ship `claude -p` at all?


Can you please point me to those terms?

On LM Studio I'm only seeing models/google/gemma-4-26b-a4b

Where can I download the full model? I have 128GB Mac Studio


downloading the official ones for my m3 max 128GB via lm studio I can't seem to get them to load. they fail for some unknown reason. have to dig into the logs. any luck for you?

The Unsloth llama.cpp guide[1] recommends building the latest llama.cpp from source, so it's possible we need to wait for LM Studio to ship an update to its bundled llama.cpp. Fairly common with new models.

1. https://unsloth.ai/docs/models/gemma-4#llama.cpp-guide


LM Studio shipped this update. Under settings make sure you update your runtimes.

Thank you both!!

They are all on hugging face

Maybe hard to believe but not everyone is speaking English to Claude

src/cli/print.ts

This is the single worst function in the codebase by every metric:

  - 3,167 lines long (the file itself is 5,594 lines)
  - 12 levels of nesting at its deepest
  - ~486 branch points of cyclomatic complexity
  - 12 parameters + an options object with 16 sub-properties
  - Defines 21 inner functions and closures
  - Handles: agent run loop, SIGINT, rate-limits, AWS auth, MCP lifecycle, plugin install/refresh, worktree bridging, team-lead polling (while(true) inside), control message dispatch (dozens of types), model switching, turn interruption
  recovery, and more
This should be at minimum 8–10 separate modules.

here's another gem. src/ink/termio/osc.ts:192–210

  void execFileNoThrow('wl-copy', [], opts).then(r => {
    if (r.code === 0) { linuxCopy = 'wl-copy'; return }
    void execFileNoThrow('xclip', ...).then(r2 => {
      if (r2.code === 0) { linuxCopy = 'xclip'; return }
      void execFileNoThrow('xsel', ...).then(r3 => {
        linuxCopy = r3.code === 0 ? 'xsel' : null
      })
    })
  })

are we doing async or not?

Claude Code says thank you for reporting, I bet they will scan this chat to see what bugs they need to fix asap.

A defining work of the "just vibes" era.

You fail to mention the prior decades of really bad software engineers writing awful code -- off of which these models trained.

Yes, anthropic is not the only company in the world with some shitty code, and yet I feel no pangs of guilt over laughing about it.

what does that even do?

Looks like it tries wl-copy, then tries xclip and then tries xsel. I have no idea what those are but google says it's for Wayland, so, I think it's a linux function trying to copy to clipboard? I think their problem is with the use of '.then(...=>...)' since there doesn't seem to be a way to tell each function that the nested ones actually finished.

wl-copy is a program to put text into the system clipboard if you're on a wayland-based system (so you can ctrl-v paste it somewhere else). Imagine like, cat ~/.ssh/whatever | wl-copy and then pasting into github or something.

xclip is the same for X based systems.


It looks like a search for the command line tool available to send content to the clipboard.

Can't tell if that obfuscated code works though.


I'm sure this is no surprise to anyone who has used CC for a while. This is the source of so many bugs. I would say "open bugs" but Anthropic auto-closes bugs that don't have movement on them in like 60 days.

> This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.

Close.


> This should be at minimum 8–10 separate modules.

Can't really say that for sure. The way humans structure code isn't some ideal best possible state of computer code, it's the ideal organization of computer code for human coders.

Nesting and cyclomatic complexity are indicators ("code smells"). They aren't guaranteed to lead to worse outcomes. If you have a function with 12 levels of nesting, but in each nest the first line is 'return true', you actually have 1 branch. If 2 of your 486 branch points are hit 99.999% of the time, the code is pretty dang efficient. You can't tell for sure if a design is actually good or bad until you run it a lot.

One thing we know for sure is LLMs write code differently than we do. They'll catch incredibly hard bugs while making beginner mistakes. I think we need a whole new way of analyzing their code. Our human programming rules are qualitative because it's too hard to prove if an average program does what we want. I think we need a new way to judge LLM code.

The worst outcome I can imagine would be forcing them to code exactly like we do. It just reinforces our own biases, and puts in the same bugs that we do. Vibe coding is a new paradigm, done by a new kind of intelligence. As we learn how to use it effectively, we should let the process of what works develop naturally. Evolution rather than intelligent design.


I don't buy this. Claude doesn't usually have any issues understanding my code. It has tons of issues understanding its code.

The difference between my code and Claude's code is that when my code is getting too complex to fit in my head, I stop and refactor it, since for me understanding the code is a prerequisite for writing code.

Claude, on the other hand, will simply keep generating code well past the point when it has lost comprehension. I have to stop, revert, and tell it to do it again with a new prompt.

If anything, Claude has a greater need for structure than me since the entire task has to fit in the relatively small context window.


> One thing we know for sure is LLMs write code differently than we do.

Kind of. One thing we do know for certain is that LLMs degrade in performance with context length. You will undoubtedly get worse results if the LLM has to reason through long functions and high LOC files. You might get to a working state eventually, but only after burning many more tokens than if given the right amount of context.

> The worst outcome I can imagine would be forcing them to code exactly like we do.

You're treating "code smells" like cyclomatic complexity as something that is stylistic preference, but these best practices are backed by research. They became popular because teams across the industry analyzed code responsible for bugs/SEVs, and all found high correlation between these metrics and shipping defects.

Yes, coding standards should evolve, but... that's not saying anything new. We've been iterating on them for decades now.

I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to. It might be good to question who stands to benefit when LLMs aren't leveraged efficiently.


> They became popular because teams across the industry analyzed code responsible for bugs/SEVs, and all found high correlation between these metrics and shipping defects.

Yes, based on research of human code. LLMs write code differently. We should question whether the human research applies to LLMs at all. (You wouldn't take your assumptions about chimp research and apply them to parrots without confirming first)

> I think the worst outcome would be throwing out our collective wisdom because the AI labs tell us to.

We don't have to throw it out. But our current use of LLMs are a dramatic change from what came before. We should be questioning our assumptions and traditions that come from a different way of working and intelligence. Humans have a habit of trying to force things to be how they think they should be, rather than allowing them to grow organically, when the latter is often better for a system we don't yet understand.


They write code differently but that doesn't mean that's the kind of code they prefer to read. Don't ascribe too much intention to a stochastic process.

Their coding style is above all else a symptom of their very limited context window and complete amnesia for anything that's not in the window.


I don't think there's intention. And yes, its output is defined by its limits. But it's not just the context, is it? Their coding style is, above all else, a result of an algorithm and input. The training data, the reinforcement, the model design, the tuning, the prompt, the context. Change any one of those things and the code changes. They are a system, like an ecosystem. Let water flow and it finds its own path. But try to dam it and it creates unintended consequences. I think what we're going to find is some of our rules apply more to a human world than an LLM world.

I’ve heard this take before, but if you’ve spent any time with llm’s I don’t understand how your take can be: “I should just let this thing that makes mistakes all the time and seems oblivious to the complexity it’s creating because it only observes small snippets out of context make it’s own decisions about architecture, this is just how it does things and I shouldn’t question it.”

I think this view assumes no human will/should ever read the code. This is considered bad practice because someone else will not understand the code as well whether written by a human or agent. Unless 0% human oversight is needed anymore agents should still code like us.

Weird and inscrutable can be good: think genetic algorithms [1] such as antenna optimization for EM radiation [2]. But I like my source code on the intelligible side.

[1] https://www.nature.com/articles/s41598-023-35470-4/figures/2

[2] https://jamessealesmith.github.io/img/antenna/ant_struct.png


This answer blew my mind. It's making me think in a very different way.

I'm with you there man..

Maybe going slow is a feature for them? A kind of rate limit by bad code way to controlling overall throughput.

"That's Larry; he does most of the work around here."

lmao

i wonder why 'lmao' gets downvoted.

Because it adds nothing to the conversation and has a Reddit vibe and that goes down like a lead balloon in these here parts, cowboy.

Take a look at the site guidelines.

Hmmm it's likely they have found that it works better for LLMs that need to operate on it.

Well, literally no one has ever accused anthropic of having even half way competent engineers. They are akin to monkeys whacking stuff with a stick.

"You can get Claude to split that up"

it's the `runHeadlessStreaming` function btw

the claude code team ethos, as far as i’ve been lead to understand— which i agree with, mind you— is that there is no point in code-reviewing ai-generated code… simply update your spec(s) and regenerate. it is just a completely different way of interacting with the world. but it clearly works for them, so people throwing up their hands should at least take notice of the fact that they are absolutely not competing with traditional code along traditional lines. it may be sucky aesthetically, but they have proven from their velocity that it can be extremely effective. welcome to the New World Order, my friend.

>there is no point in code-reviewing ai-generated code

the idea that you should just blindly trust code you are responsible for without bothering to review it is ludicrous.


(I mostly agree with you, but) devils advocate: most people already do that with dependencies, so why not move the line even further up?

There's a reputational filtering that happens when using dependencies. Stars, downloads, last release, who the developer is, etc.

Yeah we get supply chain attacks (like the axios thing today) with dependencies, but on the whole I think this is much safer than YOLO git-push-force-origin-main-ing some vibe-coded trash that nobody has ever run before.

I also think this isn't really true for the FAANGs, who ostensibly vendor and heavily review many of their dependencies because of the potential impacts they face from them being wrong. For us small potatoes I think "reviewing the code in your repository" is a common sense quality check.


Because you trust that your dependencies are not vibe coded and have been reviewed by humans.

Stop trusting any dependency now.

except they are vibe-or-not coded by some dude in Reno NV who wouldn’t pass a phone screen where you work

I'd trust that dude over professional leetcoders any day.

But you're right that trust is a complicated thing and often misplaced. I think as an industry we're always reevaluating our relationship with OSS, and I'm sure LLMs will affect this relationship in some way. It's too early to tell.


I find this relationship fascinating. since the OSS vast majority of the developers will not hesitate to pull in library X or framework Y knowing really nothing about it, who are developers, what is the quality of it, what is their release process, qa etc etc... The first thing I do now as a "senior" for decades when I get approached with "we should consider using ____" is to send them to their issues page ( e.g. https://github.com/oven-sh/bun/issues ) and then be like "spend 60-90 minutes minimum here reviewing the issues - then come back and tell me whether or not the inclusion of this is something we should consider." and yet, now with LLMs there are sooooooooo many comments on HN like "oh they must be supervised, who knows what they will be doing etc..." - gotta supervise them but some mate in Boise is all good, hopefully someone else will review his stuff that is going into your next release ...

You are still responsible for the product; the code has stopped being what defines the product.

If you don't review what the product does, you are irresponsible for the product.

Is the CEO responsible for a company's financial performance? Do they review every line of code the company writes?

It is more irresponsible to spend the time reviewing all of the code rather than spending that time on things with bigger levers for satisfying your customers.


yes but if a dev pushes a line of code that wipes the accounts of millions of users at a fintech, the dev will get fired but the CEO will get sued into oblivion. if the agent isn't responsible, you HAVE to be, cause angry people wont listen to "it's no ones fault your money is gone"

Why?

Is this a serious question? If you are handling sensitive information how do you confirm your application is secure and won't leak or expose information to people who shouldn't know it?

How do you with classic code?

Exactly.... -> Unit tests. Integration tests. UI tests. This is how code should be verified no matter the author. Just today I told my team we should not be reading every line of LLM code. Understand the pattern. Read the interesting / complex parts. Read the tests.

But unit and integration tests generally only catch the things you can think of. That leaves a lot of unexplored space in which things can go wrong.

Separately, but related - if you offload writing of the tests and writing of the code, how does anybody know what they have other than green tests and coverage numbers?


I have been seeing this problem building over the last year. LLM generated logic being tested by massive LLM generated tests.

Everyone just goes overboard with the tests since you can easily just tell the LLM to expand on the suite. So you end up with a massive test suite that looks very thorough and is less likely to be scrutinized.


if you are asking me how you *guarantee* there is not a single possible exploit in your code, you can't do that. But you can do your best and learn about common pitfalls and be reasonably competent. Just because you can't do the former doesn't mean the latter is useless.

> it may be sucky aesthetically

It's not a matter of being pretty, but of being robust and maintainable.


While the technology is young, bugs are to be expected, but I'm curious what happens when their competitors' mature their product, clean up the bugs and stabilize it, while Claude is still kept in this trap where a certain number of bugs and issues are just a constant fixture due to vibe coding. But hey, maybe they really do achieve AGI and get over the limitations of vibe coding without human involvement.

I see. They got unlimited tokens, right?

yes, because who ever heard of an AI leaking passwords or API keys into source code

How is it that a AI coding agent that is supposedly _so great at coding_ is running on this kind of slop behind the scenes. /s

Because in reality no one except for good engineers actually care about what the code looks like. The only thing most users care about with Claude Code is having it quickly vibe code the crappy idea they came up with that is going to 10x their lives, or whatever.

But it is running, that's the mystery.

Because it’s based on human slop. It’s simply the student.

Yes, if it was made for human comprehension or maintenance.

If it's entirely generated / consumed / edited by an LLM, arguably the most important metric is... test coverage, and that's it ?


LLMs are so so far away from being able to independently work on a large codebase, and why would they not benefit from modularity and clarity too?

I agree the functions in a file should probably be reasonably-sized.

It's also interesting to note that due to the way round-tripping tool-calls work, splitting code up into multiple files is counter-productive. You're better off with a single large file.


> due to the way round-tripping tool-calls work, splitting code up into multiple files is counter-productive.

Can you expand on that?


> independently work on a large codebase

Im not sure that Humans are great at this either. Think about how we use frameworks and have complex supply chains... we sort of get "good enough" at what we need to do and pray a lot that everything else keeps working and that our tooling (things like artifactory) save us from supply chain attacks. Or we just run piles of old, outdated code because "it works". I cant tell you how many micro services I have seen that are "just fine" but no one in the current org has ever read a line of what's in them, and the people who wrote them left ages ago.

> clarity too

Yes, but define clarity!

I recently had the pleasure of fixing a chunk of code that was part of a data pipeline. It was an If/elseif/elseif structure... where the final two states were fairly benign and would have been applicable in 99 percent of cases. Everything else was to deal with the edge cases!

I had an idea of where the issue was, but I didn't understand how the code ended up in the state it was in... Blame -> find the commit message (references ticket) -> find the Jira ticket (references sales force) -> find the original customer issue in salesforce, read through the whole exchange there.

A two line comment could have spared me all that work, to get to what amounted to a dead simple fix. The code was absolutely clear, but without the "why" portion of the context I likely would have created some sort of regression, that would have passed the good enough testing that was there.

I re-wrote a portion of the code (expanding variable names) - that code is now less "scannable" and more "readable" (different types of clarity). Dropped in comments: a few sentences of explaining, and references to the tickets. Went and updated tests, with similar notes.

Meanwhile, elsewhere (other code base, other company), that same chain is broken... the "bug tracking system" that is referenced in the commit messages there no longer exists.

I have a friend who, every time he updates his dev env, he calls me to report that he "had to go update the wiki again!" Because someone made a change and told every one in a slack message. Here is yet another vast repository of degrading, unsearchable and unusable tribal knowledge embedded in so many organizations out there.

Don't even get me started on the project descriptions/goals/tasks that amount to pantomime a post-it notes, absent of any sort of genuine description.

Lack of clarity is very much also a lack of "context" in situ problem.


I think humans are pretty good at it with small teams and the right structure. There are definitely dysfunctional orgs as you describe where humans produce garbage code yes. I blame the org for that, not the humans.

As to what defines clarity, yes of course, like the word quality this is very hard to define, but we can certainly recognise when it was not considered.

I think it is a goal worth striving for though, and abandoning code standards because we now have AI helpers is stupid and self-defeating, even if we think they are very capable and will improve.

The end of history has not in fact arrived with generative AI, we still have to maintain software after.


Oh boy, you couldn't be more wrong. If something, LLM-s need MORE readable code, not less. Do you want to burn all your money in tokens?

I very much doubt Anthropic devs are metered, somehow.

Unit testing is much much harder when you have functions spanning thousands of lines and no abstractions. You have to white box test everything to ensure that you hit all code paths, and it is much more expensive to maintain such tests, both as a human and LLM. I don't think this can be ignored just because LLMs are writing the code.

Massive tests files are almost as bad a massive function.

Scrolling through a 3k line test suite with multiple levels of nesting trying to figure out which cases are covered is a fucking pain in the ass.


Can't wait to have LLM generated physical objects that explode on you face and no engineer can fix.

Oh, do we agree on that. I never said it was "smart" - I just had a theory that would explain why such code could exist (see my longer answer below).

Can't we have generated / llm generated code to be more human maintainable?

Ye I honestly don't understand his comment. Is it bad code writing? Pre 2026? Sure. In 2026. Nope. Is it going to be a headache for some poor person on oncall? Yes. But then again are you "supposed" to go through every single line in 2026? Again no. I hate it. But the world is changing and till the bubble pops this is the new norm

Sorry, I was not clear enough.

My first word was litteraly "Yes", so I agree that a function like this is a maintenance nightmare for a human. And, sure, the code might not be "optimized" for the LLM, or token efficiency.

However, to try and make my point clearer: it's been reported that anthropic has "some developpers won't don't write code" [1].

I have no inside knowledge, but it's possible, by extension, to assume that some parts of their own codebase are "maintained" mostly by LLMs themselves.

If you push this extension, then, the code that is generated only has to be "readable" to:

* the next LLM that'll have to touch it

* the compiler / interpreter that is going to compile / run it.

In a sense (and I know this is a stretch, and I don't want to overdo the analogy), are we, here, judging a program quality by reading something more akin to "the x86 asm outputed by the compiler", rather than the "source code" - which in this case, is "english prompts", hidden somewhere in the claude code session of a developper ?

Just speculating, obviously. My org is still very much more cautious, and mandating people to have the same standard for code generated by LLM as for code generated by human ; and I agree with that.

I would _not_ want to debug the function described by the commentor.

So I'm still very much on the "claude as a very fast text editor" side, but is it unreasonnable to assume that anthropic might be further on the "claude as a compiler for english" side ?

[1] https://www.reddit.com/r/ArtificialInteligence/comments/1s7j...


If that's the case then that's dumb

The jury on this one is still out.

Uses public dataset to evaluate which is not meant for evaluation. Writes super specific prompt[1] and claims eye catching results.

This is the state of "AI" these days I guess...

[1] https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol...


The dataset miscomparison is a big problem. The prompt is super specific to ARC-AGI-3, which is perfectly fine to do, but skimming it I saw nothing that appears specific to the 25 games in the dataset. Especially considering they've only had one day for overfitting. Could be quite subtle leakage though.

Of course it is... we are in an era where a well-timed blog post showing "SOTA results" on a benchmark can net millions in funding

If it was not spinning so many Python processes and not overwhelming the system with those (friends found out this is consuming too much CPU from the fan noise!) it would have been much more successful. So similar to xz attack

it does a lot of CPU intensive work

    spawn background python
    decode embedded stage
    run inner collector
    if data collected:
        write attacker public key
        generate random AES key
        encrypt stolen data with AES
        encrypt AES key with attacker RSA pubkey
        tar both encrypted files
        POST archive to remote host

I can't tell which part of that is expensive unless many multiples of python are spawned at the same time. Are any of the payloads particularly large?

I highly recommend adding `/simplify` to your workflow. It walks back over-engineerings quite often for me.

playwright can do all of that too. I'm confused why this is necessary.

If coding agents are given the Playwright access they can do it better actually because using Chrome Developer Tools Protocol they can interact with the browser and experiment with things without having to wait for all of this to complete before making moves. For instance I've seen Claude Code captures console messages from a running Chrome instance and uses that to debug things...


I've also had Claude run javascript code on a page using playwright-cli to figure out why a button wasn't working as it should.

Because LLM users are NIH factories?

`^` is the symbol for the Control key not `⌘`

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You