As a substack author, with “permission is not granted to use any portion of this...

tornato7 · on July 14, 2023

Why do you think it would be illegal? You can state "permission is not granted to X" on anything you want, but that doesn't mean the law is on your side. Regular rules of copyright still apply.

P.S. Permission is not granted to downvote my comment!

barbariangrunge · on July 14, 2023

[flagged]

dwheeler · on July 14, 2023

That is not settled law. In the US the key is if this is derivative or transformative.

You can read a book without your brain getting owned by the author.

It's perfectly reasonable to say it should be considered copyright infringement, but such cases are in the court now.

Disclaimer: I am not a lawyer.

throwaway675309 · on July 14, 2023

Vitriol aside, you need to chill for a bit and touch grass.

"Training" doesn't really have a well-defined meaning, I could use your website to train something as simple as a histogram of word counts for an AI for example. Nothing about that constitutes copyright infringement under even the loosest definition of their legal concept.

Additionally weights from training and the AI's output are two completely different matters from a legal perspective as well.

tornato7 · on July 14, 2023

Ok, so if it's already copyright infringement then what does writing "permission is not granted" at the bottom of your post do, exactly?

thatguy0900 · on July 14, 2023

I agree it is wrong and should be illegal. That being said, I do find the argument that's it's no different than a human learning from and occasionally reconstructing copyrighted things compelling.

bloppe · on July 14, 2023

Most normal humans do not spend their time profitably selling their "occasionally reconstructing copyrighted things" at a rate a millions of users per second, which is a pretty important difference in practice.

DelightOne · on July 14, 2023

That said, the law is not made with super-humans being able to reproduce (slightly transformative) as good as all they read (all worlds' knowledge) in mind. A clarifying law should be created.

esafak · on July 14, 2023

Is it legal to transcribe a book from memory for money? Does it matter how faithful the transcription is?

JohnFen · on July 14, 2023

> Is it legal to transcribe a book from memory for money?

If it's an accurate transcription and you don't have permission, then it's not legal. It doesn't matter if it's for money or not (or if it's from memory or not).

> Does it matter how faithful your transcription is?

Yes, it matters. Copyright covers the specific expression of an idea, not the idea itself.

dwheeler · on July 14, 2023

It's only illegal if a law makes it illegal.

It's not clear to me that it should be illegal.

JohnFen · on July 14, 2023

What's right or wrong, and what's legal or illegal, are two different things. There are plenty of right things that are illegal and wrong things that are legal.

arsome · on July 14, 2023

I don't see why it would be illegal, AI reading it should be no different from anyone else.

JohnFen · on July 14, 2023

"Illegal" is too strong. But if you specifically disallow the use of your website contents from being used to train AI, then anyone doing so is violating the terms of service.

Which doesn't really mean anything.

At this point, the only defense I can think of is to not make the content publicly available. Which is what I've done.

bloppe · on July 14, 2023

It's a bit different because the AI is reading it with the intent of reproducing (certain aspects of) it for other people to later consume without visiting the original site. Fair use doctrine has long held that small pieces of copyrighted material can be reproduced, but the line is very blurry and generally has to be litigated if there's any ambiguity whatsoever. I'd bet many of the models we're currently using today will be pulled from serving the public over copyright lawsuits in the coming years.

I don't think training on copyrighted stuff will be ever banned, but we need to figure out how much they can be allowed to generate based on that. Eventually new models will just pop up with more carefully curated data anyway.

JohnFen · on July 14, 2023

> I don't think training on copyrighted stuff will be ever banned, but we need to figure out how much they can be allowed to generate based on that.

From a US copyright law point of view, this is most likely correct. Copyright law doesn't prevent you from ingesting copyrighted works, it prevents you from distributing them.

There is also a great deal of existing case law about how different a work has to be before it's not infringing from another work anymore. There are existing rules of thumb judges go by when trying to determine if infringement occurred. They include things like the amount of difference in expression, the quantity, whether or not it's incidental, etc.

And that's not even getting into the question of fair use -- which is a whole other kettle of fish.

I suspect that the courts will deal with these issues the way that they've always dealt with these issues: on a case-by-case basis.

KoolKat23 · on July 14, 2023

But you don't know what the intention of the reader human is either? It could be that too?

bloppe · on July 14, 2023

Sure, but that would be illegal too. I'm saying it doesn't matter who reads your website, but everyone knows exactly why GPT and Bard are going to do with the information they're "learning" from it, so they're trying to block it from reading in the first place.

KoolKat23 · on July 14, 2023

They're not doing much they're updating probabilities on a regression model. What the user of the tool thereafter do is the question.

bloppe · on July 14, 2023

Many LLM's will happily recite large segments of copyrighted material word-for-word, despite the fact that it can be difficult to tell what's happening "under the hood".

KoolKat23 · on July 14, 2023

Many people can do that too? It's what they do with it that's important.

amf12 · on July 14, 2023

> It's a bit different because the AI is reading it with the intent of reproducing (certain aspects of) it for other people to later

It could be illegal if the AI reproduces vast portions of it. If you could ask the LLM over a course of prompts to generate a significant portion of the content (as the copyright law defines it), then yes.

As long as the AI isn't reproducing it, then I am not sure if it would count.

arsome · on July 14, 2023

If I recite the vague plot of a novel or a fact I learned from an encyclopedia I'm not reproducing anything, certainly not violating copyright law.

I don't see why AI developers should be expected to think otherwise and worsen their training data over this.

bloppe · on July 14, 2023

Scale and position matter. Google is the conduit that connects most people to most websites, so in the EU they are considered a "gatekeeper" and need to be careful about conflicts of interest with the people and websites using their "gate". I hope American competition law catches up to the point we can recognize that market makers simply should not be participating in the markets they make (and Google search is a market maker; it's connecting "buyers" [viewers or advertisers, depending on your perspective] to "sellers" [websites or viewers, respectively]), but I digress.

The point is that Google has a certain market position that makes it very different when they "recite the vague plot of a novel or a fact they learned". The point of competition law is to "distort" free market capitalism for the betterment of society. This is one of those cases where practical considerations trump information idealism. The quality of information on the internet will go down if we stop rewarding original publishers.

barbariangrunge · on July 14, 2023

I keep having to say this: An ai is not a person

anonn77 · on July 14, 2023

Yep, there’s a big difference in practice. If an AI could attribute and provide royalties then it may not be so different but that’s never going to happen. A big reason Bard exists is Google trying to ensure they stay profitable and relevant. They don’t care where the knowledge really comes from.

JohnFen · on July 14, 2023

> If an AI could attribute and provide royalties then it may not be so different

But even that requires the permission of the copyright holder. Nobody is required to accept an infringing use of their work in exchange for royalties.

arsome · on July 14, 2023

Don't forget to provide royalties for every synapse in your head.

sleepybrett · on July 14, 2023

This is the most braindead take and aibros keep pushing. Your 'AI' model is NOT A PERSON and therefore it is different.

dekhn · on July 14, 2023

Does that include book readers for the blind? They typically have some sort of optical character recognition and benefit a user, just like an ML training dataset benefits users.

My point being: it's exceptionally hard to create laws that deny precisely what you don't want and allow precisely what you want, without quickly getting into details that bring the entire law's assumptions into question. Here being "because an ML training is not a person, it has no right to scan the web".

anonn77 · on July 14, 2023

The main difference here is that these AI bots are operating with an entirely different agenda. The ethics remain to be seen and the jury is out as to whether they will benefit the user they way the promise they will.

Also on a whole different scale and instead of supplementing the web content it’s devaluing it to a degree.

dekhn · on July 14, 2023

The "ai bots" aren't operating with an agenda- at least as far as we can tell now, training algorithms and their scrapers do not have agency.

Basically you're assuming the agenda of the operator, saying "that's bad an shouldn't be allowed". But I see the web- except for things specifically labelled with standard copyright disclaimers- as effectively a large corpus of publicly available data, "in the market square for all to see".