Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Huh. There have definitely been well publicized examples of this happening, like the quake inverse square root


You can't copyright a mathematical operation. Only a particular implementation of it, and even then it may not be copyrightable if its a straightforward and obvious implementation.

That said the implementation doesn't appear to be totally trivial and copilot apparently even copies the comments which are almost certainly copyrightable in themselves.

https://x.com/StefanKarpinski/status/1410971061181681674 https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb1...

However a twitter post on its own isn't evidence a court will accept. You would need the original poster to testify that what is seen in the post is actually what he got from copilot and not just a meme or joke that he made.

Also the plaintiffs in this case don't include id-Software and there is some evidence that id-Software actually stole the fast inverse sqrt code from 3dfx so they might not want to bring a claim here anyways.


Not sure where you thought I said you could copyright a mathematical operation, I was clearly referring to the implementation due to the mention of “quake”.

When it was reported, I was able to reproduce it myself.


Weren't people getting it to spit out valid windows keys also?


GPT4 regurgitated almost full NYT articles verbatim. It's strange that this lawsuit seems to be so amateurish that they failed to properly demonstrate the reproduction. Though of course it might require a lot of legal technicalities that we naively think are trivial but they might be not.


I read that case.

Absolutely there were a few outliers where a judge might want to look more closely. I'd be surprised if -under scrutiny- there wouldn't be any issues whatsoever that OpenAI overlooked.

However, it seemed to me that over half of the NYT complaints were examples of using the -then rather new- ChatGPT web browsing feature to browse their own website. In the case, they then claimed surprise when it did just what you'd expect a web browsing feature to do.


> You can't copyright a mathematical operation.

i agree from a philosophical pov, but this is clearly not the case in law.

https://en.wikipedia.org/wiki/Illegal_number


The second step is to remove from consideration aspects of the program which are not legally protectable by copyright. The analysis is done at each level of abstraction identified in the previous step. The court identifies three factors to consider during this step: elements dictated by efficiency, elements dictated by external factors, and elements taken from the public domain.

https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...


Its even simpler, iD is owned by ZeniMax. ZeniMax is owned by Microsoft.. who would they even sue?


That's not how that works.

All the plaintiffs would need to do is provide evidence that copywritten code was produced verbatim. This includes showing the copyrighted code on GitHub, showing copilot reproducing the code (including how you manipulated copilot to do it), showing that they match, and showing that the setting to turn off reproduction of public code is set.

It makes no difference who owns the copyrighted code, it need only be shown that copilot is violating copyright. Microsoft can't say "uhh that doesn't count" or whatever simply because they own a company that owns a company that owns copyright on the code.


"Trust no one... even yourself"


Algorithms can and are definitely patented in utility patents in the US.



It reads like the judge required them to show it happened to their code, not to any code in general. That's a much higher bar. There are thousands of instances of fast inverse square root in the training data but only one copy of your random github repositories. Getting to model to reproduce your code verbatim might be possible for all we know, but it isn't trivial.


>It reads like the judge required them to show it happened to their code, not to any code in general.

Rightly so, you have to show some sort of damage to sue someone, not just theoretical damages.


of course for standing. but it seems like with the right plaintiffs this could have gone forward


But that’s like saying my lawsuit alleging Taylor Swift copied my song could have gone forward with a plaintiff who had, years ago, written a song similar to what Ms. Swift recorded recently. That”s true, but perhaps the lesson here is that damages that hinge on statistically rare victims should not extrapolated out to provide windfalls for people who have not been harmed.


i think that is a weak analogy and also unnecessary bc it is already clear what i am saying


If it only copies code that has been widely stolen already then that's a lot weaker of a case and is something they can do a lot to prevent on a technical level.


Code that has been copied widely != code that has been widely stolen.

Open source licenses allow sharing under certain conditions.


It could be forced, of course. I can republish my copyrighted code millions of times all over the internet. Next time they retrain there is a good chance my code will end up in their corpus, maybe many many times, reinforcing it statistically.


The article mentions that GitHub copilot has been trained to avoid directly copying specific cases it knows, and that although you can get it to spit out copyright code by prefixing the copyrighted code as a starting point, in normal us cases its quite rare.


yes, but you need to show that it happened _in your case_, not that it can happen in general.


Fast inverse square root is now part of the public domain.

Also, even if this weren’t the case you can’t sue for damages to other people (they’d need to bring their own suit)


Is the particular implementation that the model spits out 70+ years old?


[deleted]


But copilot distributed it (allegedly) without complying with the GPL license (which requires any distribution to be accompanied by the license) so it still would be an instance of copyright infringement. https://x.com/StefanKarpinski/status/1410971061181681674


Has it really already been 70 years since John Carmack died?


Ah, you're right. I was wrong to say "public domain".

It would be more correct to say Quake III Arena was released to the public as free software under the GPLv2 license.


There is a large gap between public domain and GPL. For starters if Copilot is emitting GPL code for closed source projects... that's copyright infringement.


That would be license infringement, not copyright infringement.


Copyright infringement is emitting the code. The license gives you permission to emit the code, under certain conditions. If you don't meet the conditions, it's still copyright infringement like before.


No.

Copyright infringement could be emitting the code in a manner that exceeds fair use.

The license gives you permission to utilize the code in a certain way. If Copilot gives you GPLed code that you then put into your closed source project, you have infringed the license, not Copilot.

> If you don't meet the conditions, it's still copyright infringement like before.

Licensing and copyright are two separate things. Neither has anything to do with the other. You can be in compliance with copyright, but out of license compliance, you can be the reverse. But nothing about copyright infringement here is tied to licensing.

To be clear: I am a person who trashed his Reddit account when they said they were going to license that text for training (trashed in the sense of "ran a script that scrubbed each of my comments first with nonsense edits, then deleted them"). I am a photographer who has significant concerns with training other models on people's creative output. I have similar concerns about Copilot.

But confusing licensing and copyright here only muddies waters.


Without adhering to the conditions of the GPL you have no license to redistribute the code and are therefore infringing the copyright of the author.


Apparently, the court disagrees with you, and doesn't find "emitting" the code a copyright infringement.

It'd be a long bow to draw to say that what is akin to a search result of a snippet of code is "redistributing a software package".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: