For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more anematode's commentsregister

Looks like a very sophisticated operation, and I feel for the maintainer who had his machine compromised.

The next incarnation of this, I worry, is that the malware hibernates somehow (e.g., if (Date.now() < 1776188434046) { exit(); }) to maximize the damage.


Isn't that already how it is?

I mean the compromised machine registers itself on the command server and occasionally checks for workloads.

The hacker then decides his next actions - depending on the machine they compromised they'll either try to spread (like this time) and make a broad attack or they may go more in-depth and try to exfiltrate data/spread internally if eg a build node has been compromised


> But then the clean room implementations started showing up. People had taken Anthropic’s source code and rewritten Claude Code from scratch in other languages like Python and Rust.

Seems like the phrase "clean room" is the new "nonplussed"... how does this make any sense?


Heya, post author here. I think I was just wrong about this assertion. I got into a discussion with a copyright lawyer over on Bluesky[^1] after I wrote this and came away reasonably convinced that this wouldn’t be a valid example of a clean room implementation.

[^1]: https://bsky.app/profile/mergesort.me/post/3mihhaliils2y


The most fitting method would be to be to train an LLM on the Claude Code source-code (among other data).

Then use Anthropic's own argument that LLM output is original work and thus not subject to copyright.


I think it means you write a spec from the implementation. Then you write a new implementation from the spec. You might go so far as to do the second part in a "clean" room.


Heh, the original being entirely vibed had me thinking of an interesting problem: if you used the same model to generate a specification, then reset the state and passed that specification back to it for implementation, the resulting code would by design be very close to the original. With enough luck (or engineering), you could even get the same exact files in some cases.

Does this still count as clean-room? Or what if the model wasn't the same exact one, but one trained the same way on the same input material, which Anthropic never owned?

This is going to be a decade of very interesting, and probably often hypocritical lawsuits.


right. that's not what people are doing here though, at all


in a typical clean-room design, the person writing the new implementation is not supposed to have any knowledge of the original, they should only have knowledge of the specification.

if one person writes the spec from the implementation, and then also writes the new implementation, it is not clean-room design.


I believe the argument is that LLMs are stateless. So if the session writing the code isn't the same session that wrote the spec, it's effectively a clean room implementation.

There are other details of course (is the old code in the training data?) but I'm not trying to weigh in on the argument one way or the other.


Arguably, an even worse day to release it ;)


Why?


Ya, I tend to believe that (most) human VR will be obsoleted well before human software engineering. Software engineering is a lot more squishy and has many more opportunities to go off the rails. Once a goal is established, the output of VR agents is verifiable.


Definitely. As an extreme but fun example... in one project I had a massive hash map (~700 GB or so) that was concurrently read to/written from by 256 threads. The entries themselves were only 16 bytes and so I could use atomic cmpxchg, but the problem I hit was that even with 1GB huge pages, I was running out of dTLB entries. So I assigned each thread to a subregion of the hash table, then used channels between each pair of threads to handle the reads and writes (and restructured the program a bit to allow this). Since the dTLB budget is per core, this allowed me to get essentially 0 dTLB misses, and ultimately sped up the program by ~2x


The "delegation pattern" for datastructures:

https://timharris.uk/papers/2013-opodis.pdf


ah! I thought I was being original :)


> Strawberry uses separate renderer processes for settings pages, modals, dropdowns, and other UI components.

Erm. Why? Svelte or not, is this program expected to run on anything but the latest and greatest hardware?


Very nice :)

For a while I've been annoyed that esbuild, which is written in Go, eschews these APIs to detect changes in watch mode and instead continually polls the filesystem: https://github.com/evanw/esbuild/issues/1527#issuecomment-90.... It actually consumes quite a bit of battery, so I might fork it and apply this post's implementation!


Actually, the neural net itself is fairly imprecise. Search is required for it to achieve good play. Here's an example of me beating Stockfish 18 at depth 1: https://lichess.org/XmITiqmi


Dear lord. Are you at least transparent with your clients that this is the standard to which you hold your own code?


$100k was the quote of the project from sloccount... (No one paid me for this. I created it for myself.)


A LoC based valuation probably assume humans writing code and therefor work-hour costs, I'd bet it no longer applies to generated LoCs.


Credential stealers hate this one trick!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You