For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | senaevren's commentsregister

You are definitely right to flag it, apologize for that. I used an AI assistant for the replies, and I will make sure not to use one going forward.

Appreciated!

Fair point and worth being precise about. Cert denial is not meaningless: it leaves the lower court ruling intact, it signals the Court did not find the issue urgent enough to resolve now, and as you note, other circuits will look at the DC Circuit's reasoning. What it does not do is bind other circuits or establish Supreme Court precedent. The distinction matters here because if a Ninth Circuit case involving AI-generated code reaches a different conclusion, that circuit split would be live law regardless of the Thaler cert denial.

You are right that no court has yet ruled that a specific set of human contributions to AI-assisted work was sufficient to establish authorship. What exists is the inverse: the Copyright Office has granted partial registrations where human-authored elements were separated from AI-generated elements, as in Zarya of the Dawn, where the human-written text was protected but the Midjourney images were not. The Allen v. Perlmutter case pending in Colorado is the first direct judicial test of whether iterative prompting and editing can constitute authorship. Until that decision, the positive threshold is genuinely unknown. The piece reflects this in the calibration section at the end, though your point is worth adding to the authorship discussion more explicitly.

Fair and correct. Cert denial means the Court declined to hear the case, not that it endorsed the lower court's reasoning or settled the question nationally. The DC Circuit ruling stands and the Copyright Office's position is consistent, but that is stable doctrine rather than Supreme Court-settled law. Updated the piece to reflect this distinction accurately.

Since this is a tech audience... the Supreme Court uses a bounded priority queue. An unbounded queue would risk growing impractically large.

There are some kinds of cases where the Court has "original jurisdiction," meaning they must hear them, but those are very rare.


thanks for this; it's definitely a fair point. I updated the piece to reflect this

The original bargain you describe, limited term in exchange for public disclosure, is exactly what makes the current situation strange. If AI-generated output falls into the public domain immediately, that is actually closer to the original intent of copyright than 95-year terms. The legal question is whether that outcome happens by design or by accident, and what it means for the people building products on top of AI-generated codebases right now.

By design or by accident

It’ll happen by evolution. Just complex systems trending the way they trend.


The chardet dispute is the closest thing to an active test case on this specific question, and you are right that it has not resolved into settled law. "Emerging legal consensus" was imprecise. The more accurate framing is: the legal community's working assumption, based on how copyright doctrine treats derivative works, is that training-data provenance travels with the output. That assumption has not been tested definitively in court yet.

The model ownership question and the output ownership question run on separate legal tracks and the piece focuses on the second deliberately. On the first: the model weights are owned by Anthropic under work-for-hire from their engineers regardless of what the training data contained. Training data copyright infringement is a separate tort claim against Anthropic, not a basis for anyone else to claim ownership of the model. The Bartz settlement resolved the pirated books claim without disturbing Anthropic's ownership of the weights. Owning the training data does not give you ownership of the model trained on it, any more than owning the paint gives you ownership of the painting.

The sound recording analogy breaks down at the point where the recorder makes no creative decisions. Pressing record captures what is already there. Prompting Claude generates something that did not exist, through decisions the model makes about structure, naming, pattern, and implementation. The closer analogy is hiring a session musician and telling them the key and tempo. You own the recording under work-for-hire if they signed the right contract, but the creative expression in the performance is theirs unless explicitly assigned. The button you push to start the model is not the same button as the one on the recorder.

> Prompting Claude generates something that did not exist, through decisions the model makes about structure, naming, pattern, and implementation.

LLMs don't make decisions. Their output is completely determined by an algorithm using the human prompt, fixed weights, and a random seed. No different than the many effects humans use in image or audio editors. Nobody ever questioned whether art made using only those effects on a blank canvas was subject to copyright.


Fourier theory says that any sound, however complex, can be synthesized by summing sines and cosines. That's what an LLM does, if you twist the metaphor enough. It synthesizes complex outputs from simpler basis functions that are, or should be, uncopyrightable.

The fact that it inferred those basis functions from studying copyrighted works doesn't seem relevant. Nor does the fact that the "Fourier sums" sometimes coincide with larger fragments of works that are copyrighted. How weird would it be if that didn't happen?


Of course it's relevant. How copyright infringement happens doesn't actually matter, all that matters is that the infringement happened.

If I painstakingly recreate A New Hope frame by frame, pixel by pixel, that's infringement. Even if I technically used 0 content from the original.


Nobody is doing that, though. You might get a watermarked screenshot or stock photo now and then, or a couple of mostly-verbatim paragraphs from Harry Potter.

In any case, if the copyright mafia insists on butting heads with AI, they'll find that the fight doesn't quite play out the way it has in the past.


The meaningful human authorship question is the elephant, agreed, and the regulators have deliberately refused to quantify it for exactly the reason you describe any bright line number becomes a target to game rather than a standard to meet.

The logging point is sharper than it might appear. In a copyright dispute over AI-assisted code, interaction logs could cut both ways. A plaintiff trying to establish human authorship would want the logs to show substantial architectural redirection, multiple rejections of Claude output, and documented reasoning for structural decisions. A defendant challenging that authorship claim would subpoena the same logs to show verbatim acceptance of output without modification.

The practical implication i guess here,that the developers who want to preserve a copyright claim over AI-assisted code should treat their prompt history as a legal document from the start. It seems all over the world the logs are the evidence. Whether they help or hurt depends entirely on what they show.


The bit about treating one’s prompt history as a legal document has really struck a nerve with me. I’ve been keeping a separate git history solely for my prompts. Initially, the goals were simple: reuse prompts, turn some into skills, etc. But in light of the insights from the article and the discussions here, I need to treat this practice as serious business.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You