An AI product - more specifically, the model weights - is a derivative work of the original works used for training;
I don't think that's so obvious. Why would it be so, any more than for humans who learn from material?
I mean, one might ask if your very comment here is a derivative work of the aggregate corpus of material you've previously read on the subjects of copyright, open source licensing, and AI. I suspect most of us would agree that it isn't so, but why treat the model weights of an AI so differently than the synaptic weights in your brain?
Because copyright, and law in general, is an expression of the political agreement reached amongst the members of our society. It does not exist in the absolute, there are no legal principles that transcend humanity, law is a human creation to arbitrate our collaboration and conflicts.
Therefore, in the legal sense, an algorithm does not "learn", despite any functional analogy you can make with human learning, because an algorithm is not a party to the social contract that established said law; its only "rights" are an extension of the legal right of its author/proprietor. Your "learning" right does not cover, for example, your tape player recording a performance and playing it back at later date to some commercial audience. You have a right to hear and learn the song, you can play it back from memory, but your tape recorder does not, it's a tool, just like your fancy AI machine.
This will continue to hold true despite any advancements in AI, up to the moment when synthetic entities will acquire distinct legal rights.
> You have a right to hear and learn the song, you can play it back from memory, but your tape recorder does not, it's a tool, just like your fancy AI machine.
There is two different copyrights, one for the melody/text, and one for the recording. Sometimes they have different owners who fight. The most famous recent example is the Taylor Swift controversy I guess. She ended up re-recording some of her old songs so that she owns the rights to the new recording.
Yes, I was talking about the copyright for the performance, not the underlying melody. If, for example, you hear a public domain folk song, you can sing it later, but you tape player can't, even if it "remembers" it just like you do, because the rendition is owned by its performer. The example had the purpose to clarify the distinction between the rights of the human listener and their tools, but I see based on the response it confused some people.
To give another example, even if I can walk or run in a park, my bot army with a million mechanical feet that all behave by analogy to the human foot can't also run through the park. Why should it be any different in the case of my AI derivation machine with superhuman memory and derivation ability?
So even if the courts find that AI training is fair use, and not derivation, that conclusion will not be based on the analogy with the way humans and machines learn. Nor will it preclude the writing of laws, by humans, explicitly redefining copyright to protect human creators from unlicensed AI training. The social contract is anthropocentric all the way down.
Therefore, in the legal sense, an algorithm does not "learn",
Are you saying there is actual case law / precedent establishing that, or is that just your personal theory? If the former, I'd love to see any such citations, as I was not aware of those developments.
Your "learning" right does not cover, for example, your tape player recording a performance and playing it back at later date to some commercial audience.
That's pretty much a straw-man here. I'm not talking about cases where an AI reproduces an existing work exactly. That is problematic from a copyright standpoint for both a machine OR a human.
> You have a right to hear and learn the song, you can play it back from memory
You do not in fact have a right to play a copyrighted song from memory, any more than you have a right to play a recording, unless you're playing it for yourself. Just like you don't have a right to show a movie on a DVD you bought to others.
> Why would it be so, any more than for humans who learn from material?
Because AI isn't human, and there is no credible argument that it's anything close to a human, and unless you do establish that connection, you can't just auto-apply the logic/intuition we've developed for humans to AI
I think it does apply here because the point is that learning isn't direct copying and the knowledge you get from it isn't copyrightable, and AI could be the same in instances where it's not directly copying
When you want a clean room non-GPL implementation of something GPL that already exists, you ask the developers not to look at the original. I don't see how this is any different.
It's completely different. You're talking about re-implementing a specific piece of software. And that whole "clean room" thing isn't an absolute anyway... that's the level of paranoia you engage if you want to be super duper sure that you can't be accused of copying the original.
What I'm talking about is closer to "you fire up your IDE (or Emacs) right now, and churn out 250 lines of code for some arbitrary piece of software. Is it a derivative work of every pieces of software whose source code you have previously look at?"
Note that I'm not referring to the case where the AI spits out code that is identical to code taken from another project. I'm aware that that sometimes happens, and that is obviously a problem, just like it would if a human did it. What I'm arguing is only that it probably should not be taken as a given that AI generated code is automatically considered a derivative work.
Here's a thought experiment: say an AI emits a single line of code tomorrow. You examine it, and then spend weeks, months, or even years searching all the open source code that's "out there". You fail to identify a line in any existing code-base that was clearly the upstream source for the line from the AI. So is that line a derivative work? If so, of what?
If 100 experienced C devs are asked to write strcpy, some of those implementations will be identical and that fact will not indicate any copyright infringement has occurred.
What often happens is that you ask one set of devs to look at the GPL code and draw specifications of the functionality and have a second (non-intersecting) set of devs do the implementation without directly referring to the GPL code, but indirectly doing so by using the specification.
I don't think that's so obvious. Why would it be so, any more than for humans who learn from material?
I mean, one might ask if your very comment here is a derivative work of the aggregate corpus of material you've previously read on the subjects of copyright, open source licensing, and AI. I suspect most of us would agree that it isn't so, but why treat the model weights of an AI so differently than the synaptic weights in your brain?