You can't copyright a mathematical operation. Only a particular implementation of it, and even then it may not be copyrightable if its a straightforward and obvious implementation.
That said the implementation doesn't appear to be totally trivial and copilot apparently even copies the comments which are almost certainly copyrightable in themselves.
However a twitter post on its own isn't evidence a court will accept. You would need the original poster to testify that what is seen in the post is actually what he got from copilot and not just a meme or joke that he made.
Also the plaintiffs in this case don't include id-Software and there is some evidence that id-Software actually stole the fast inverse sqrt code from 3dfx so they might not want to bring a claim here anyways.
Not sure where you thought I said you could copyright a mathematical operation, I was clearly referring to the implementation due to the mention of “quake”.
When it was reported, I was able to reproduce it myself.
GPT4 regurgitated almost full NYT articles verbatim. It's strange that this lawsuit seems to be so amateurish that they failed to properly demonstrate the reproduction. Though of course it might require a lot of legal technicalities that we naively think are trivial but they might be not.
Absolutely there were a few outliers where a judge might want to look more closely. I'd be surprised if -under scrutiny- there wouldn't be any issues whatsoever that OpenAI overlooked.
However, it seemed to me that over half of the NYT complaints were examples of using the -then rather new- ChatGPT web browsing feature to browse their own website. In the case, they then claimed surprise when it did just what you'd expect a web browsing feature to do.
The second step is to remove from consideration aspects of the program which are not legally protectable by copyright. The analysis is done at each level of abstraction identified in the previous step. The court identifies three factors to consider during this step: elements dictated by efficiency, elements dictated by external factors, and elements taken from the public domain.
All the plaintiffs would need to do is provide evidence that copywritten code was produced verbatim. This includes showing the copyrighted code on GitHub, showing copilot reproducing the code (including how you manipulated copilot to do it), showing that they match, and showing that the setting to turn off reproduction of public code is set.
It makes no difference who owns the copyrighted code, it need only be shown that copilot is violating copyright. Microsoft can't say "uhh that doesn't count" or whatever simply because they own a company that owns a company that owns copyright on the code.
It reads like the judge required them to show it happened to their code, not to any code in general. That's a much higher bar. There are thousands of instances of fast inverse square root in the training data but only one copy of your random github repositories. Getting to model to reproduce your code verbatim might be possible for all we know, but it isn't trivial.
But that’s like saying my lawsuit alleging Taylor Swift copied my song could have gone forward with a plaintiff who had, years ago, written a song similar to what Ms. Swift recorded recently. That”s true, but perhaps the lesson here is that damages that hinge on statistically rare victims should not extrapolated out to provide windfalls for people who have not been harmed.
If it only copies code that has been widely stolen already then that's a lot weaker of a case and is something they can do a lot to prevent on a technical level.
It could be forced, of course. I can republish my copyrighted code millions of times all over the internet. Next time they retrain there is a good chance my code will end up in their corpus, maybe many many times, reinforcing it statistically.
The article mentions that GitHub copilot has been trained to avoid directly copying specific cases it knows, and that although you can get it to spit out copyright code by prefixing the copyrighted code as a starting point, in normal us cases its quite rare.
But copilot distributed it (allegedly) without complying with the GPL license (which requires any distribution to be accompanied by the license) so it still would be an instance of copyright infringement. https://x.com/StefanKarpinski/status/1410971061181681674
There is a large gap between public domain and GPL. For starters if Copilot is emitting GPL code for closed source projects... that's copyright infringement.
Copyright infringement is emitting the code. The license gives you permission to emit the code, under certain conditions. If you don't meet the conditions, it's still copyright infringement like before.
Copyright infringement could be emitting the code in a manner that exceeds fair use.
The license gives you permission to utilize the code in a certain way. If Copilot gives you GPLed code that you then put into your closed source project, you have infringed the license, not Copilot.
> If you don't meet the conditions, it's still copyright infringement like before.
Licensing and copyright are two separate things. Neither has anything to do with the other. You can be in compliance with copyright, but out of license compliance, you can be the reverse. But nothing about copyright infringement here is tied to licensing.
To be clear: I am a person who trashed his Reddit account when they said they were going to license that text for training (trashed in the sense of "ran a script that scrubbed each of my comments first with nonsense edits, then deleted them"). I am a photographer who has significant concerns with training other models on people's creative output. I have similar concerns about Copilot.
But confusing licensing and copyright here only muddies waters.