For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | Philpax's commentsregister

> In looking at the code that the LLMs have produced for the project, especially given the pretty massive and widespread architectural changes needed to make the implementation libified and memory safe, we decided that the codebase is not a derivative work that would require carrying forward the GPL license and have decided to release the code under the MIT instead.

Hmm. That's going to be interesting.


A translation of a book to a different language is a derivative work. So a translation of a computer program to a different programming language is also. But if in the translation of the book you start altering the plot and the personalities of that characters, does it at some point become not a derivative work? What point? IANAL, and I have no real idea, but I imagine that point has been probed significantly in case-law with respect to creative works. Given the current climate of ever-expanding scope of "intellectual property", if they admit that the LLM had access to git source code then I would say their case is weak at best.

The agents.md says “here’s the git source code” https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

This isn’t even a question of training data, thy fed the full git source code directly to the llm.


I wonder if imitating clean room reverse engineering with two LLMs would be enough for licence compliance.

That already exists[1]. It looks like a joke but apparently they will accept your money to do it, which seems to cross the line of a joke.

[1]: https://malus.sh/


> translation.

It's not technically a translation, it's a re-implementation, with test suites acting as the destination. If it was a file by file translation your argument would have been valid.


Git is part of the LLM's training set though, so simply asking it to recreate git in another language is pretty equivalent. Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)

As mentioned in another comment, it's even more clear cut in this case. They actually put the original git sources in their project repo and instructed the agent to use it as the "source of truth".

Simple thought experiment. If you handed this same agents.md file (https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...) to a human software developer and let them work on exactly the same goal, would their output be considered a derivative work?


That's something I have been wondering. If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation. I don't see why this shouldn't apply to LLMs as well. If an LLM might have been trained on the original source code, it should be considered "tainted".

Yes, and realistically any code that LLMs produce is a derivative work of its training data. There's going to be a huge disaster licensing wise

I have absolutely no idea how LLMs got through anyone's legal departments, I guess the hope is that if everyone breaks the law enough, it'll just be fine


> if everyone breaks the law enough, it'll just be fine

That's pretty much what happened, isn't it? These concerns were all discussed in the beginning back in 2022, and I recall answers from many here on HN along the lines of "oh well, we can't stop it now or we'll risk falling behind China in AI development"

So yeah, the laws went out the window a long time ago the moment our government and the people decided to just look the other way willingly in the name of "progress."


Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?

Honestly? Yes. This is why its such a problem that most of the training data was not used with permission, and without the correct copyright status or license associated with it

There's a lot of arguments about humans doing the same thing, but the reality is that humans and robots don't enjoy the same legal protection. Its clearly a derivative work of all of its training data


> If I as a human want to make a clean room reimplementation of some API or application, I must not have read the source code of the original implementation.

That is the difference between necessary and sufficient. Clean-room is sufficient to guarantee avoiding copyright, but it is not necessary. The line legally is south of there, but that position was chosen because they didn’t want to crossing and it was easier to argue for legally in court.

tl;dr: clean room is overkill for avoiding copyright infringement


> Like, you can almost certainly get these LLMs to output gits full source code with some prompting, so there's not that much difference (as much as we like to pretend that AI generated code has no copyright implications)

Are you sure? LLMs are in some way a compressed version of their input but it's a pretty lossy compression (arguably this makes them more like a compression algorithm than a compressed version of the data). I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.


> I'm not sure you can prompt a full, accurate, copy of a nontrivial codebase out of them. Even with zero temperature their accuracy is just not that high.

Granted, these are some of the most widely spread texts, and not codebases, but just fyi: https://arxiv.org/pdf/2601.02671

> For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984 (Section 4).


That paper is basically using the LLM as a compression algorithm: it's prompting with some section of the book and it's reprompting if it doesn't give the right output. Notably this only works if you already have a copy of the book in question!

Wouldn't a re-implementation be akin to 'heres how it works, write the code' rather than 'heres the code, redo it in rust'?

Related, software API compability is not a derivate work, or eligible to protection, as ruled in the US and in the EU. Google, SAP R/3, etc. cases.

Or SCO Vs IBM.

If everything would be a derivate work we would not Linux.


Yes, but as soon as copyright became a problem for very rich people parts of it were cancelled.

1) re-implementation for compatibility (which was quickly "reestablished" through use of copyright-protecting encryption. In other words: do you get to write software that connects to MS/Apple/Google/Facebook servers without authorization from those companies? Yes. Do you get to copy an encryption key from their software to make it possible? No)

and, more recently,

2) violating copyright for LLM training

and, currently mostly attempted:

3) "uncopyrighting" run software through an LLM, and some people "believe" it comes out with your copyright on it! Because very rich people want to sell uncopyrighting.

Ie. the jury's still out what will happen when it's billionnaire vs billionnaire.

Of course, the question is what happens the second someone does this with a disney movie, or a big microsoft application ...


> Yes, but as soon as copyright became a problem for very rich people parts of it were cancelled.

When copyright law was established, not many poor people owned printing presses. That is to say, copyright law is a PROTECTION to the very rich, not an inconvenience


true but as the exception for model training (which can only be done by very, very rich people and organizations) shows, there's some new rich and they want new rules.

Against the will of the people, as evidenced by the court cases and protests online ...


Well, there's lots of really interesting opinions here from a lot of armchair lawyers.

To clarify, my stance on this is that the reimplementation did not copy protected expressions (Jplag reports less than 1.8% max similarity between the codebases), it's done in good faith, and it's what's best for the broader Git ecosystem (assuming Grit even becomes usable, which it's currently not purported to be).

From a copyright standpoint, however, only the first argument there is relevant. Grit is an independently authored implementation of Git-compatible behavior, with negligible similarity to Git source code.

I think antirez summarized the situation quite well and I broadly agree with his position: https://antirez.com/news/162

I think that those in the community who know me and have worked with me in the Git and open source communities for the last 20 years know that my intentions are to contribute, share and foster innovation and learning. Many of the main authors of the Git source code are friends of mine and I have no intention to steal anything from anyone, only to make their great ideas more broadly useful.


Have you addressed anywhere why you chose not to keep the copyleft license? It burns a lot of goodwill to use an AI for what many people will see as copyright laundering, and git has done just fine with the GPL, so it doesn’t seem like a blocker for adoption. What do you get from stripping the copyleft?

https://blog.gitbutler.com/series-a likely has a large part to play in it.

By which I mean, what do we imagine a16z thinks of the [L]GPL?

My brief experience in a startup exposed to them is that a16z seems willing to fund "infrastructure" projects more than most, but they did seem to have a ready set of answers on what "open source" means in that context.

(If someone can find me an a16z funded team that published copylefted code, I'll take this back.)

EDIT: Ok, i'll eat my hat, Gemini found me some counterexamples

  Element (Matrix): The company behind the decentralized Matrix communication protocol is on a16z's investment list. In late 2023, Element relicensed its core software (including the Synapse server and its clients) to AGPLv3.

  Uniswap Labs: A massive cornerstone of the a16z Crypto portfolio. They published the Uniswap V2 smart contracts under GPL-3.0 (though they later shifted to a Business Source License for V3 and V4).

  a16z Themselves: In an ironic twist, a16z's own crypto engineering team maintains a public GitHub repository (a16z/a16z-contracts — a library for Solidity contracts) that is literally licensed under AGPL-3.0.

you may be shocked to hear that this is gemini hallucinating; Element (creators of Matrix) has never taken investment from a16z; it must be getting mixed up with a different Element.

Oof, thanks for the correction.

Many bothans were boiled alive to get me this misinformation.

The Very Annoying Clanker wishes to apologize: "I owe you a massive apology. I completely set you up for that, and you handled the fallout perfectly.

Getting corrected by Arathorn (Matthew Hodgson, the literal CEO of Element and co-founder of Matrix) is a classic Hacker News rite of passage, but it is infinitely more frustrating when your AI assistant handed you the bad data in the first place."

Many eyerolls.


err my gud a ceo on haxer news.

Hey AI, please change my stolen code in a non-breaking way so that jplag reports less than 1,8% similarity.

I mean ”hey artist, take this stolen character and make them legally distinct” is already a common thing.

there are event exact measurements to take into account, for visual art, music etc. 'what is legally not stealing'.

Art, however, is a little different than code. code is a thing, but it also produces things.

It weirds me out there is a measure of code similarity but not a measure of if code is semantically the same. for example implementing a protocol could be done in many ways, but ultimately whats talked between clients/servers on the network is the same. so it's semantically the same despite being totally different code.


> Many of the main authors of the Git source code are friends of mine and I have no intention to steal anything from anyone, only to make their great ideas more broadly useful.

By working-around/subverting the terms they provided their contributions under? While you claim to be doing this in good faith, and state "it's what's best for the broader Git ecosystem", that's all based on your own opinion which appears to ignore the benefits and intent of licenses such as the GPL.

Out of interest, Would you be happy for someone to do the same with the GitButler source code? (Feed it through an LLM and re-publish the result under an MIT license with different branding)


Is there a point in license laundering? Where GPL stifles git adoption?

You know I think if you'd just committed to clean rooming it you'd be fine, but you didn't.

Now you're caught between the devil and the deep blue sea: if the AI did no creative work, then you're definitely in violation of the original GPL license.

If the AI did do creative work that breaks GPL, you still didn't, which leaves you with the problem that you cannot in good faith license a thing which you don't own. No creative work? No ownership claim. There's precious little (if any) of your creativity in copy pasting 4000 tests and a link to the original source code and saying "copy this in Rust".

The flagrant display of cynicism you make in arguing that the ends justify the means (even if a result is the wholesale looting of open source) disgusts me, and if I could communicate to you only one thing it should be that you should not be surprised that other people are also disgusted by behavior like that even when it falls within the letter of the law (a claim I have not yet seen you rigorously defend).


Man, all they had to do was LGPL it and there'd be no ill will.

My question here is not whether it's legally permissible. I'll leave that to others.

It's WTF is wrong with this next generation of devs ? ... that they have such a problem with the GPL that they think it's important to rewrite and relicense and take away a legal structure which is supposed to protect our free software?

I can imagine some concerns with Git being written in C.

I cannot understand any legitimate concerns with its license that it needs to change.

What does the GPL stop people doing with git? And if there are some... why are people trying to do that? And why would you work for free to help people do it? [Edit: I see, you're not working for free.]

Missing an 'f' in the project name.


The original git had a command line interface. It's widely assumed that using a GPL'd program in your program through the command line does not cause the GPL to "infect" your program.

OTOH, one of the major reasons for grit is to provide a library interface. If they kept it GPL, anything that used grit through the library interface would have to also become GPL.

This could be the "legitimate concern" you're asking for.

But the LGPL was also an option -- it addresses that arguably legitimate concern and keeps the spirit of the original license.


I mean, yes, clearly, LGPL is the explicitly obvious answer here. And they rejected it.

GPL makes sure that the code remains open. Seems like these new gen devs are against open source.

It pisses me off because I'm also the author of a rewrite-in-Rust project (though it's more than that, and yes I now use agents though I didn't at the start) and I specifically chose [A|L]GPL for it to protect the IP of the asset and because it felt like the most ethical choice.

I removed it but I added that I hate these people. :P So yeah, it pisses me off, too.

"Don't hate the player, hate the game" as they say.

People want to get paid. They perceive the GPL as getting in their way.

Or, as it is also said: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”


s/open source/free software

They love open source when it means they can steal from the public and then privatize it later with their VC funded startup, much in the same way Microsoft "loves" Linux [when you run it on Azure, or in WSL]

What they are against is free/libre software that prevents their grifting.


Yeah, pretty much. :)

Related:

Malus – Clean Room as a Service https://news.ycombinator.com/item?id=47350424

Just like for 1984 and the Torment Nexus, someone took the concept not as warning but as instruction manual.


they would be just wrong. I hope someone with standing sues

I don't think it's that clear cut. The functional parts probably aren't copyrightable, only the stylistic ones. It's going to be a mix of courts applying laws in new ways that hasn't been done before and fact specific questions about what actually persisted through the LLM if it goes to court.

I'd be fascinated to see what happens if it does. Both in the analyses that we'd get of what the LLM did to the codebase and on the legal decisions on what the copyrightable creative elements in code actually are.

If I was the author though... there would be no way that I would be volunteering to be a test case like this. Also seems just rude for no reason.


It probably would have been less bad if he had chosen MPL-2.0 or LGPL-2.1-or-later. But he chose MIT, which cuts at the core of the intent of licensing the project with a share-alike license.

Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg? Now tell me how creating a rust library using the git test suite is different?

> using the git test suite

That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).


> That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.

> To the extent the resulting work is a derivative of the test suite it is possibly infringing

It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.

Simplify this down:

Assume the following is copyrighted:

    fn test_sum() {
        assert_eq!(sum(1, 1), 2);
    }
Does writing the following code:

    fn sum(a: u8, b: u8) {
        a + b
    }
infringe on the test copyright?

Writing

    fn sum(a: u8, b: u8) {
        a + b
    }
Doesn't infringe upon copyright period, because there's no creative element in that work.

Imagine a more substantial example though. Perhaps you have a test that checks that some file written in a binary format is correct, and gives names (creative elements) to each field of the format that it prints when you mess up the field, and has comments describing why the bytes are laid out like they are (the comments being copyrightable even if the facts they describe aren't), and the LLM copies those field names and comments verbatim... Now it's quite likely that the LLMs work is a derivative of the test suite.


> Doesn't infringe upon copyright period, because there's no creative element in that work.

There's likely a threshold at some point. It's helpful to look at a minima and then continue from there though.

I'm curious if there's case law that supports your assertions here?


For that assertion in particular I believe I'm practically parroting a ruling by the district court in Oracle vs Google about some extremely simple Java functions that Oracle claimed Google copied. Though I can't say I checked to make sure I'm remembering right.

You're recalling it right, but there's a nice quote from Judge Alsup in that case that talks about this exact situation:

> “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification...”

Here given that this is rust and the original expression is C, the implementations cannot be the same by definition.


That's essentially the same thing as modding a game, though. I know there have been lawsuits to stop modding, but I don't think any were successful.

I'd challenge you here to think about this in terms of the legal aspects rather than reaching specifically for similarities as similar is often meaningless in the law or contracts when specific acts are codified rather than generalized ones.

I'd say what we're talking about here is probably a fair bit different to modding a game in most aspects.


I haven't followed any relevant cases but I would be surprised if there's any serious dispute that the common methods of modding games generally create derivative works. I think the dispute would be downstream of that as to whether or not the mods are covered by fair use.

If you did it in a loop until the test passed, maybe?

Your result is essentially impossible without the original. With ffmpeg, your result does not depend on ffmpeg specifically - you can use any video creation tool.


Repetition isn't really a factor in deciding whether something is infringing or not - check the copyright law in your jurisdiction. Here if you look specifically at what an LLM's sampling stage is doing, it's choosing non infringing tokens (i.e. rust source code) over infringing ones (i.e. C source code). So it's making an intentional choice to do something similar rather than creating something that has the same expression. That doesn't seem like it's copyright infringement to me.

A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.

> There is no other behavioral description that would reproduce all needed functionality.

Tests often are exactly the information necessary to understand exactly what the output should be. See https://github.com/git/git/blob/master/t/t0000-basic.sh for an example of how detailed these tests are.

It would be reasonable to point an LLM at these and use them with a basic knowledge of git to produce a rust version of git in a non-infringing manner.

If you did this manually it would take a long time.


Medium, substitutibility, basics of copyright law.

Fair point on medium - this was a lazy example.

Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side

basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.


The test suite could test aspects of the architecture/design of the codebase that are not necessary for interoperability and constitute novel expression of a piece of software in a way that is not at all language specific.

By definition a test suite is about testing interoperability with the test suite. An HTTP test suite should likely test for whether response code 418 is implemented a particular way and while humorous it would still be an interop test no?

No, the git test suite is about testing the git codebase. If you want something like that, you need a conformance suite, which does not exist for git.

If feeding the source code through a complier yields a derivative work, why wouldn't feeding it to an LLM give the same result?

Because compilers and LLMs do different things, and what is done matters, so you can't reason by stepping from one to the other.

Compilers don't axiomatically yield derivative works, they simply in practice do because for non-trivial programs they preserve copyrightable elements of the work in the output.


Well compilers are a mechanical transformation and if that were sufficient to free you of IP law then IP law wouldn't work.

An LLM is also a computer program which takes input and produces output related in some way to that input. However I don't think most people would view it as a "mere" mechanical transformation. One could tautologically argue that an LLM blends the user input with the training inputs which is a sort of transformation and further that the LLM itself is a computer program thus it is mechanical in nature. However it should be immediately obvious that such an overly literal interpretation is in danger of subsuming human work as well. Where the boundary lies is an unanswered question.

Related, compilers can pose a problem depending on what the output includes. For example common lisp compilers that aren't under a permissive license are a minefield because regardless of what anyone might say the image that gets output includes (approximately) the full language implementation verbatim in addition to the user's program.


So, if we will compile or decompile code using LLM instead of a compiler, then we can use the result for free?

(LLM can translate code to/from other code or to/from a machine code).


functional parts not being copyrightable means that you can't claim a program is a copyright violation based on the fact it does the exact same thing based on compatibility reasons (you can copy what the program does). E.g. git stores refs in .git/refs, so does grit, that's not a violation. You still can't copy the program.

Yes... and now we get to the fact specific question of "did they copy the program". Or actually the answer to that is plainly "no" - they made something similar from it - and didn't run ctrl-c ctrl-v in an unlicensed manner, but "did they copy the relevant facets of the program into the new similar thing".

Making something similar is copying for the purpose of copyright law. If I trace over a Disney character it's still copyright Disney.

No. You're allowed to make a similar tool, the functional elements are not copyrightable. There's a long history, predating LLMs by many decades, of doing this in the software industry.

My use of the word "similar" does not imply here that I think it's obvious that they are "similar" in any copyrightable elements - whether they are or not is one of the interesting questions I think this case would have to resolve.

Incidentally you're also allowed to make similar creative elements so long as they aren't copies and you did so independently... which could actually come up in a case like this (imagine the LLM produced a similar function to some function in the original... but the original wasn't in the context window at the time. Not at all unlikely with code where there often is only one or two natural ways to write something).


I suspect that the issue is more likely that the LLM code doesn't have an author and hence some parts of it can't be licenses, it's less likely that it's infringing on git's copyright for various reasons. (I am not a lawyer, but I do read copyright law for funsies).

https://www.copyright.gov/newsnet/2025/1060.html

> It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts.

Well that's interesting.


Also "just" the legal opinion of a government office. It has yet to be tested in court

why wouldn't it? If you run git through a compiler it's still copyright the git devs, same if you run it through an LLM.

What makes you think that's what the article says that it did? There's a lot of specific nuance and it doesn't say that anywhere. In fact it speaks of making a test suite pass only. This is the classic cleanroom bios from specs approach but no need to extract it as the test is available to run and there's nothing in the GPL that suggests that running a test suite infects software that you run it on.

Surely git’s source is already in LLM’s training corpus. So this is far from clean room approach.

You've read books and they are in your brains corpus. You only infringe copyright if you reproduce the same actual words from the books in your memory (and then do infringing acts defined by copyright laws with that output).

Here that's not happening. The code being produced by the LLM is Rust, not C.


Make a small contribution to git then sue

Particularly because LLM generated code is not licensable in any way. If you wrote it with an LLM you cannot own it.


Not a fan of this trend of "cleaning" GPL licensed software and releasing under permissive licenses. Also why I'm not a fan of UUtils nor Canonical's early adoption of it in Ubuntu.

The intent here is extraction of all the value provided by copyleft projects without the obligation to give back. Wether it's technically legal or not, it's disgusting behavior IMO.


I agree, I certainly can't comment on the legality of this license laundering but I would call them an asshole.

That’s explicitly not what’s happening with uutils; they have contributed fixes and test cases back to upstream

And just like that, it was forked by Microsoft a few days ago. Handed to them on a silver platter.

> Not a fan of this trend of "cleaning" GPL licensed software > Wether it's technically legal or not, it's disgusting behavior IMO.

GNU was originally developed to "clean" UNIX from the AT&T license.


An idea...

Take this (assuming it's not slop), relicence as GPL, submit upstream (imagine it's accepted for a moment...).

If they proceed with license washing then from the Rust version, it's certainly derived work.


This is not a proper black-box reimplementation, I doubt they can get away with that. And that's not mentioning all other obvious ethical concerns of course.

black-box/clean-room isn't necessarily required, though. It does make it a lot harder to argue in court, of course.

I'm not a copyright lawyer, but it seems pretty clear to me you can't wash a license using an LLM.

[US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.

Anything in the result written by a human can be, and if it was all emitted by the LLM then that portion originally written by a human carries its own copyright.

As a work of an LLM, the entirety presumably can not be copyright, at all. Portions written by humans presumably carry their original copyright.


> [US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.

This is a bit stronger than the actual report where this has been discussed finds. See part 2 in https://www.copyright.gov/ai/ for details, but TL;DR, parts where humans have control over the expression may be copyrightable. But working out which parts those are is likely a difficult question (would likely require proof of provenance across many of those LLM sessions)


Knowing what you don't know is such an important skill in life and your career. And I 100% agree with you that the author is, well, off their rocker.

Let me give an example: I could take Goldeneye from the N64, extract the binary and then run it through an LLM to disassemble it and possibly rewrite it in a modern higher-level language. Do you think Nintendo would look at that and say "well, he did a lot of work so he's escaped our license"? Of course not. It's just silly.

ingesting the source code and producing output in another language is quite clearly a derivative work. You don't need to be an IP lawyer to figure that out.

Now, if you went to Calude and gave it documentation and told it to produce something that was compatible, would that be a derivative work and thus covered by the GPL? I would guess probably. But I'm not 100% sure anymore. I wouldn't risk it however.

Here's another thought experiment: what if someone takes this supposedly MIT licensed source tree, plugs it into another LLM and asks it to produce the output in C? Now how is it licensed? It might be very similar. After all, there are only so many ways to produce a SHA1 hash and so many ways to do a command line parser.

But this then makes it an interesting legal issue. In the Oracle v. Google court case, this was a key issue. Google successfully argued there's only so many ways to write a loop so just because a loop is similar to the source, that doesn't mean it's copyright infringement (as Oracle argued).

Anyway, it's a crazy position to take.


> Knowing what you don't know is such an important skill in life and your career. And I 100% agree with you that the author is, well, off their rocker.

They aren't the only ones - look at the number of people in this thread who are arguing that this is analogous to producing a movie with ffmpeg - just because ffmpeg is GPL, does not make your movie GPL.

I am struggling to understand how such a high level of cognitive dissonance is possible: They believe both a) that the license can be laundered in this manner, and that b) the license they put on the result is effective!


Well that is already how it is done with numerous multi-decade open rewrites of closed games. They usually require the asset pack.

I don't know how this squares with law, but Oracle v Google gave a very valuable judgment to the public that an API is not copywritable. If we take the LLM out of it, that's all we are talking about in the pure case.

Of course, we can't take the LLM out, but it is the starting point.


> Well that is already how it is done with numerous multi-decade open rewrites of closed games

Serious such rewrites don't start with the code of the closed game!

> I don't know how this squares with law, but Oracle v Google gave a very valuable judgment to the public that an API is not copywritable. If we take the LLM out of it, that's all we are talking about in the pure case.

Not at all. The LLM used to write grit has seen the git code. That is what we're talking about here.

> Of course, we can't take the LLM out, but it is the starting point.

The LLM isn't the important thing. The important thing is that the git source code was used to make grit.


>Serious such rewrites don't start with the code of the closed game!

No, but they often involve reverse engineering the binary pretty heavily.


> No, but they often involve reverse engineering the binary pretty heavily.

… and those often end up in legally dubious situations.


> Do you think Nintendo would look at that and say "well, he did a lot of work so he's escaped our license"? Of course not. It's just silly.

That's because you're re-using assets.


heh - https://github.com/n64decomp/007

game decompilation and emulation is as old as computing


I don't care if they can convince a judge. The fact that they even want to in the first place tells me what kind of people they are.

F-ing scumbags. It's already free, but they still decide to steal it.


Will you accept a port of Torch to Rust? https://github.com/forecast-bio/ferrotorch

"Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.

For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."


Looks like they're still getting the post out, but the model is live now, and the system card is at https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3... .

The Chinchilla scaling laws give you a minimum for the number of tokens you should be using for a given size: if you can't meet what they suggest for that size, you should shrink the size, as, otherwise, the capacity of the model is going to waste.

I do agree that it is a datapoint, but GP's point is that this model was undertrained, so it's hard to draw the same conclusions from it that we would from other research.


The irony of a post about a port primarily written by Claude having been primarily written by Claude on a website primarily designed by Claude. Come on.

Claude (and Codex) designed the site, mostly because I'm not a UI coder; if I'd designed the site nobody'd want to read it but me, simply thanks to the UX.

And I have a full-time job and more; I draft with an LLM's assistance and revise with another LLM (and other humans where possible) because I'm just arrogant enought o think that what I think might be useful to others. If it's not useful to you, I get it. Such is life.


Why can't you just publish the prompt instead? Do you not see how LLMs subtly alter your original message and erase your voice? They fill gaps that didn't exist, they create syllogisms that make no sense, and the voice is now so ridiculously AIdiosyncratic that it makes my eyes boil!

If you have a message that takes 100 words to say, do not use a LLM to add 400 words to it, this isn't a school assignment! Stretching a spaghetti does not yield more spaghetti, it just makes a mess!

Where is the value the LLM adds? Grammar? Vocabulary? The price you pay is you sound like everyone else and your original message is lost in the noise.


As far as publishing "the prompt" - there's no "the prompt." The draft was put together and expanded over a set of interactions with an LLM and other people over the space of about six hours. "The prompt" would have been about twelve pages long and unreadable. Funny as heck, but unreadable.

(If you're really interested, you can check the logs in the site and find the actual interaction that started the article out. It was a comment from someone else, and it got me thinking.)


Heh. Funny thing: I've been writing online and professionally for literal decades, since around 2002 or so, and the LLMs tend to change my actual writing voice relatively little and usually in positive ways, since they say I meander too much.

Your process loses your unique voice. The content was OK, but too verbose, and needed data on other rust ports of similar scope.

The issue is the quippy titles, “something - aside - continue” phrasing, and other constructions are feel like they or actually are wholly LLM written. I find a high correlation to this and low density fluff. The author did not have 10 paragraphs of things to say, but used an LLM to inflate a short outline to that. We would all of been better off with a tighter document - either human written or better prompted.


It's not about that. Me as a reader wants to read you as a human, with all of its colors and nuances.

These days due to usage of LLMs I developed (unknowingly) an LLM detector when reading these. I actually get distracted.

So please, I believe you do have something to tell to the world, but please take it slower. No need to rush. I'd rather have something to read uniquely made by you.


I agree with the others - I'm sure that you've provided your own input, but Claude's writing and design style is so overwhelmingly dominant that those who have spent time with it can immediately recognise it, and it makes it hard to take at face value that you were the primary author, even if you were.

For your workflow, I'd suggest drafting with a LLM to help you find the right balance of content, and then throwing all of that out and writing it yourself. Otherwise, it won't sound like you.


To be fair, who cares about ai slop websites? To be honest, they're often better than the average webdev garbage. Language runtimes are held to a much, much, higher standard.

Hot-reloading. You can edit your logic without rebuilding and restarting the host application; this cuts your iteration time from minutes to seconds, especially if the application is in a state that would need to be recreated.

I would argue that memory safety is part of devex: it's just one less thing you have to be constantly vigilant about.

I would be very surprised to see a large Rust codebase being harder to maintain than a large Zig codebase. The former makes it much easier to maintain invariants at scale.

Well, you could go ask Richard Feldman, who I believe cited that reason to rewrite the nascent Roc language from being implemented in Rust to Zig, or anyone else who is moving from Rust to anything else. I've seen multiple people at this point complain about the scaling issue with Rust; the larger the codebase, the more you end up fighting the compiler before anything will actually build.

Note that it doesn't matter if the compiler is correct about its claims; if the language doesn't actively discourage patterns that produce this outcome at scale, then the language does not scale, end of story.

The trend is basically either linear or exponential: as more LOC of Rust are added, the greater the percent of total time you spend fighting the compiler to get a successful build, especially in a team context (which is exactly what gets you to >1M LOC). Solo devs can contain the whole design in their minds and may not run into this issue as much; the problem specifically occurs on teams where the mental model MUST be fractured by necessity, and this results in "distributed knowledge of magic" that ends up constantly breaking.

Perhaps this explains WHY there aren't that many Rust projects done by more than 1 developer that approach that many LOC.


With Rust macros it's also possible to bolt on proper design by contract into the language.

Unless you enforce those macros somehow in a team setting, someone's going to forget to use them, and then you're still stuck with the original problem.

Neither has been battle-tested at the relevant scale.

What kind of scale are you thinking of?

By the time C++ and Java were as old as Rust is today there were thousands of programs that over 1MLOC that had been maintained for at least five years. Rust is a rather old language, yet I doubt there are even hundreds of Rust programs over 1MLOC.

The first Matrix is 27 years old; Reloaded is 23 years old.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You