Huh. There have definitely been well publicized examples of this happening, like...

voxic11 · on July 9, 2024

You can't copyright a mathematical operation. Only a particular implementation of it, and even then it may not be copyrightable if its a straightforward and obvious implementation.

That said the implementation doesn't appear to be totally trivial and copilot apparently even copies the comments which are almost certainly copyrightable in themselves.

https://x.com/StefanKarpinski/status/1410971061181681674 https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb1...

However a twitter post on its own isn't evidence a court will accept. You would need the original poster to testify that what is seen in the post is actually what he got from copilot and not just a meme or joke that he made.

Also the plaintiffs in this case don't include id-Software and there is some evidence that id-Software actually stole the fast inverse sqrt code from 3dfx so they might not want to bring a claim here anyways.

whimsicalism · on July 9, 2024

Not sure where you thought I said you could copyright a mathematical operation, I was clearly referring to the implementation due to the mention of “quake”.

When it was reported, I was able to reproduce it myself.

TechDebtDevin · on July 10, 2024

Weren't people getting it to spit out valid windows keys also?

pas · on July 10, 2024

GPT4 regurgitated almost full NYT articles verbatim. It's strange that this lawsuit seems to be so amateurish that they failed to properly demonstrate the reproduction. Though of course it might require a lot of legal technicalities that we naively think are trivial but they might be not.

Kim_Bruning · on July 10, 2024

I read that case.

Absolutely there were a few outliers where a judge might want to look more closely. I'd be surprised if -under scrutiny- there wouldn't be any issues whatsoever that OpenAI overlooked.

However, it seemed to me that over half of the NYT complaints were examples of using the -then rather new- ChatGPT web browsing feature to browse their own website. In the case, they then claimed surprise when it did just what you'd expect a web browsing feature to do.

sulandor · on July 10, 2024

> You can't copyright a mathematical operation.

i agree from a philosophical pov, but this is clearly not the case in law.

https://en.wikipedia.org/wiki/Illegal_number

williamcotton · on July 10, 2024

The second step is to remove from consideration aspects of the program which are not legally protectable by copyright. The analysis is done at each level of abstraction identified in the previous step. The court identifies three factors to consider during this step: elements dictated by efficiency, elements dictated by external factors, and elements taken from the public domain.

https://en.wikipedia.org/wiki/Abstraction-Filtration-Compari...

voidfunc · on July 10, 2024

Its even simpler, iD is owned by ZeniMax. ZeniMax is owned by Microsoft.. who would they even sue?

naikrovek · on July 10, 2024

That's not how that works.

All the plaintiffs would need to do is provide evidence that copywritten code was produced verbatim. This includes showing the copyrighted code on GitHub, showing copilot reproducing the code (including how you manipulated copilot to do it), showing that they match, and showing that the setting to turn off reproduction of public code is set.

It makes no difference who owns the copyrighted code, it need only be shown that copilot is violating copyright. Microsoft can't say "uhh that doesn't count" or whatever simply because they own a company that owns a company that owns copyright on the code.

nvr219 · on July 10, 2024

"Trust no one... even yourself"

banish-m4 · on July 10, 2024

Algorithms can and are definitely patented in utility patents in the US.

beeboobaa3 · on July 9, 2024

https://en.wikipedia.org/wiki/Illegal_number

wongarsu · on July 9, 2024

It reads like the judge required them to show it happened to their code, not to any code in general. That's a much higher bar. There are thousands of instances of fast inverse square root in the training data but only one copy of your random github repositories. Getting to model to reproduce your code verbatim might be possible for all we know, but it isn't trivial.

Suppafly · on July 10, 2024

>It reads like the judge required them to show it happened to their code, not to any code in general.

Rightly so, you have to show some sort of damage to sue someone, not just theoretical damages.

whimsicalism · on July 9, 2024

of course for standing. but it seems like with the right plaintiffs this could have gone forward

brookst · on July 10, 2024

But that’s like saying my lawsuit alleging Taylor Swift copied my song could have gone forward with a plaintiff who had, years ago, written a song similar to what Ms. Swift recorded recently. That”s true, but perhaps the lesson here is that damages that hinge on statistically rare victims should not extrapolated out to provide windfalls for people who have not been harmed.

whimsicalism · on July 10, 2024

i think that is a weak analogy and also unnecessary bc it is already clear what i am saying

Dylan16807 · on July 10, 2024

If it only copies code that has been widely stolen already then that's a lot weaker of a case and is something they can do a lot to prevent on a technical level.

account42 · on July 11, 2024

Code that has been copied widely != code that has been widely stolen.

Open source licenses allow sharing under certain conditions.

sleepybrett · on July 10, 2024

It could be forced, of course. I can republish my copyrighted code millions of times all over the internet. Next time they retrain there is a good chance my code will end up in their corpus, maybe many many times, reinforcing it statistically.

daedrdev · on July 9, 2024

The article mentions that GitHub copilot has been trained to avoid directly copying specific cases it knows, and that although you can get it to spit out copyright code by prefixing the copyrighted code as a starting point, in normal us cases its quite rare.

dathinab · on July 10, 2024

yes, but you need to show that it happened _in your case_, not that it can happen in general.

polishTar · on July 9, 2024

Fast inverse square root is now part of the public domain.

Also, even if this weren’t the case you can’t sue for damages to other people (they’d need to bring their own suit)

anonymoushn · on July 9, 2024

Is the particular implementation that the model spits out 70+ years old?

on July 9, 2024

[deleted]

voxic11 · on July 9, 2024

But copilot distributed it (allegedly) without complying with the GPL license (which requires any distribution to be accompanied by the license) so it still would be an instance of copyright infringement. https://x.com/StefanKarpinski/status/1410971061181681674

immibis · on July 9, 2024

Has it really already been 70 years since John Carmack died?

polishTar · on July 9, 2024

Ah, you're right. I was wrong to say "public domain".

It would be more correct to say Quake III Arena was released to the public as free software under the GPLv2 license.

KnightHawk3 · on July 9, 2024

There is a large gap between public domain and GPL. For starters if Copilot is emitting GPL code for closed source projects... that's copyright infringement.

FireBeyond · on July 9, 2024

That would be license infringement, not copyright infringement.

immibis · on July 10, 2024

Copyright infringement is emitting the code. The license gives you permission to emit the code, under certain conditions. If you don't meet the conditions, it's still copyright infringement like before.

FireBeyond · on July 10, 2024

No.

Copyright infringement could be emitting the code in a manner that exceeds fair use.

The license gives you permission to utilize the code in a certain way. If Copilot gives you GPLed code that you then put into your closed source project, you have infringed the license, not Copilot.

> If you don't meet the conditions, it's still copyright infringement like before.

Licensing and copyright are two separate things. Neither has anything to do with the other. You can be in compliance with copyright, but out of license compliance, you can be the reverse. But nothing about copyright infringement here is tied to licensing.

To be clear: I am a person who trashed his Reddit account when they said they were going to license that text for training (trashed in the sense of "ran a script that scrubbed each of my comments first with nonsense edits, then deleted them"). I am a photographer who has significant concerns with training other models on people's creative output. I have similar concerns about Copilot.

But confusing licensing and copyright here only muddies waters.

account42 · on July 11, 2024

Without adhering to the conditions of the GPL you have no license to redistribute the code and are therefore infringing the copyright of the author.

FireBeyond · on July 11, 2024

Apparently, the court disagrees with you, and doesn't find "emitting" the code a copyright infringement.

It'd be a long bow to draw to say that what is akin to a search result of a snippet of code is "redistributing a software package".