Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Copyleft of course is of limited use since the FSF slept through the desktop to Internet service transition and now sleeps through the "AI" transition.

GPLv4 should be AGPL+no-AI if it is supposed to have any effect.



It is copyright as a whole that is sleeping through the "AI" transition/robbery.


Usually when I hear "robbery" it brings to mind someone stealing your phone or wallet at knife-point. Certainly not training some model on some code that involves neither violence nor depriving anyone of anything.

The concept of copyright is the fiction that information - something that can be freely modified, copied, or transmitted - is of the commodity-form, that information should be treated as if it were a single real object that were inherently scarce.

It is a fiction that exists to serve of some of the most hateful, mafia-like firms - your Disneys, UMGs, Getty Images, and the like.

So if the AI interests are powerful enough to deal that whole rotten system a serious blow, then I'm all for them.


Server bandwidth is a commodity. If a site gets DDOSed by LLM training, then actual humans won’t be able to access the information.


GPLv4 should be AGPL+release the source code for your deployment/provisioning infrastructure, there is clear market demand for a freedom preserving license that defends against AWS, how hard could it be to craft something up that would be indigestible to them?


> ...release the source code for your deployment/provisioning infrastructure...

SSPL works a bit similarly: https://www.mongodb.com/legal/licensing/server-side-public-l...

> If you make the functionality of the Program or a modified version available to third parties as a service, you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License. ...

> “Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.

Not really a free software license, at least typically not considered one though.


Very cool, I did not know that the SSPL was like that!

And I'm surprised both by the fake grassroots campaign to attack sspl: https://ssplisbad.com

And by the OSI's interpretation of it as not open source: https://opensource.org/blog/the-sspl-is-not-an-open-source-l....


The SSPL is free in spirit, but it doesn't fit the OSI definition of "Open Source" of the FSF's definition of "Free software" because it places restrictions on purpose of use.

IMO we should have more license like this. Who cares whether you've got the approval and branding of some other foundation?


The SSPL does not place restrictions on purpose of use unless you also believe the AGPL does. You can diff the two licenses and see that the part which some people consider to be a purpose of use restriction is identical.

The OSI declared it nonfree, because the OSI is a consortium of large companies - cloud service providers and the like.

The FSF decided not to issue any opinion.

Debian decided not to issue any opinion, but because it was only affecting a few packages and they have good GPL counterparts, switched to the GPL counterparts anyway.

There is no evidence the SSPL is nonfree. That is big tech disinformation. Please don't spread it. What the SSPL actually is is an extreme point in the WTFPL-to-AGPL spectrum - it's more AGPL than AGPL itself. It's still on that spectrum, not a completely different spectrum like a proprietary license.


It is actually terrifying how successful the cloud provider cartel has been at spreading FUD about SSPL. Denied OSI certification and packages getting dropped from repos?


Debian has an unusually strict policy that will absolutely err on the side of not including packages (famously not packaging things you need, like video drivers) and valkey is simply better than redis anyway (more features, better performance) and is drop-in compatible, so it was an easy choice for them.


>g to their most recent 10-Q Barbie sales have been flat YoY so I guess they felt like they had to do something.

No thanks. Seriously please no, I do not want to see it, that stuff has negative value.


Since AI use is based on "fair-use" how can you add a no-AI clause? Only protection I see is to not publish the code.


“Fair use” is a notion related to copyright. The authors in Europe who won in cases of free software licensing violations did it on the grounds of breach of contract law.


Right, seems I can prevent AI training by marking a work here in Germany (via UrgH § 44b Abs. 3). That would be perhaps something to add to (a variant of) the GPL.

Doesn't prevent training in the US and use of the trained AI and material it produces here, I guess.

Then I have to start a civil law-suit and prove that the produced work is similar enough to the the original work. That was the step that failed in the Hellwig (Linux) vs. VMWare lawsuit.


Also the whole AI is “fair use” battle is not finished. AI companies like to pretend like it is but big part of “fair use” is that it doesn’t compete with the initial work… it’s hard to make a strong case there.


In either case "Fair use" is a thing very limited to USA and does not apply outside it. There may be similar things in other jurisdictions, but it cannot be assumed the same reasoning and rules apply.


So far, lawsuits seem to be heading towards a "AI is not subject to copyright" direction, so a no-AI license doesn't seem practical.

As for the internet service industry: AGPL already covers that, I think. However, I'm not sure if AGPL has ever been tested in court (especially outside of the USA). For the EU there's the EUPL v1.2 which should be AGPL compatible.

I don't think there's much you can do to stop AI companies from stealing your code unless you go out of your way to make every method name, variable, and string template contain at least one slur or support for terrorism. That way, someone will need to explicitly rewrite the code for it to be spat back out, which is more effort than most AI bros are willing to put in.


A clause in a license which states something like "You may not use this work for training an artificial neural network" should suffice.

If the AI developer ignores the term and trains his AI with it anyway, they would clearly be in violation of your licensing terms, and then it would be for a court to decide.

Having the term in your license would be preferential than omitting it, since it's hard for a court to rule that they're not in violation of your license when its written there plain as day. If you don't have the clause then you basically have no defence as it can easily be argued that they've followed your licensing terms, which don't prevent its use for such purpose.


> A clause in a license which states something like "You may not use this work for training an artificial neural network" should suffice.

If you needed a license to train AI then that clause would be redundant. It doesn’t matter what clauses you add to a license if the courts let people train AI without a license.


You don't need a license to train AI, but that's not the point.

The license is applying to using your work - a EULA if you like. If you provide a license which states that it cannot be used for training, then users of your work are required to follow its licensing terms, and not following them is a violation of the licensing terms.

If you've made your work available to use, without an explicit clause specifying that it can't be used for training AI, then it is a valid use of your work that doesn't violate its licensing terms.

EULAs themselves are a grey area where their clauses are not necessarily enforcible, but it's ultimately for a court to decide, and it would definitely be better to have it as a clause in your license.

If the user ticks a box to declare they've read the terms and agree, it's hard for them to argue otherwise in court.

A sufficiently advanced LLM should be able to read the term and comprehend that it can't learn from the work and should ignore anything from it. As AI improves it will be more difficult for the providers to argue that term violations were accidental because their AIs would specifically need to be instructed to ignore such clauses - and the developers who provide such instructions would have a much weaker defence in a court.


AI companies don’t need to be users of the software to train on its source code.

And, again, if they don’t need a license then the license is irrelevant.

Lastly, if you place restrictions on the use of the software then it’s no longer open source. You certainly can’t do it with the GPL.


> Lastly, if you place restrictions on the use of the software then it’s no longer open source.

This particular point is the topic of the post, to be fair. A project with a copyleft license can still be "open source", even if it's not FOSS.

> AI companies don’t need to be users of the software to train on its source code.

I don't think this is true (or at least it's moot given the word "users"); they still need to legally obtain a copy of the source code, which is offered by the creator alongside a license. You can't just download a project from github and disregard the license that comes with it.


I don't really care about "open source" (per OSI definition). I'm specifically proposing a new kind of license which does place a restriction on use - that of using it to train AI - and I'm suggesting that such license apply not only to programs, but to the source code itself, with a prominent notice in each source file - and to non computer programs also - basically any creative works which carry a prominent notice that they're not to be used for AI training.

Disclaimer: IANAL

In any claim you would argue that you made it clear that your code should not be used for such purpose; that the AI developer is made aware of your intention for it to not be used for such purpose, that the AI developer was negligent, and so forth.

There are frameworks for dealing with cases where no contract exists, such as unjust enrichment[1], which (according to the citation), is examined as follows:

    Was the defendant enriched?
    Was the enrichment at the expense of the claimant?
    Was the enrichment unjust?
    Does the defendant have a defense?
    What remedies are available to the claimant?
The first point would basically be extremely difficult to argue against w.r.t the big AI providers. They're making billions on the backs of other people's creative works.

The third point it can be argued that what AI developers are doing is almost criminally unjust - they are performing mass copyright violations by training their AI on works for which they have no rights to copy, and performing no creative acts themselves, besides specific neural network designs which are essentially useless without training data. Moreover, if you've explicitly given notice that your copyrighted works are not intended to be used for AI training, then it would be easier to argue that the AI company's enrichment is unjust because they've intentionally ignored it.

The fourth point is precisely the reason you want a disclaimer stating that it should not be used for training. An AI developer could argue in their defense, that the work is publicly available and has no restrictions on use of training. However, that defense goes out of the window if you have made it clear that it should not be used for such purpose, and the developer (or AI itself) can be reasonably expected to be aware of this notice. They would need another defense besides "we didn't know".

So your claims in a court would largely come down to the second point: proving that their enrichment has come at your expense. This might be difficult for an individual, though could be proven if your code is something quite unique and that an AI is basically regurgitating it, and that this results in a loss for yourself. More likely, a claim against these AI companies would be a class action suit on behalf of many claimants where they could reasonably demonstrate that the work produced by AI is not original, is infringing on their own creative works, and that the defendant has no defense because they intentionally ignored the disclaimer that the creative works were not to be used for such purpose.

W.r.t the final point, there are two potential avenues for remedy if a claim were successful: One is that the claimants are financially compensated in proportion to damages - the other is that the AI developers are forced to stop using works which prominently display a "NO AI TRAINING" disclaimer, and any successful claim would set a precedent so that AI companies would necessarily need to be more considerate about the works they use for training.

If you went to court with such a claim, which is more likely to result in success: The case where you didn't put a notice to forbid using in AI training, or the case where you clearly put notices everywhere that it should not be used for AI training?

Ideally it shouldn't be necessary to have such a disclaimer, as using whole copyrighted works for training is not "fair use." However, the pressure is currently on courts to permit the use of copyrighted works for AI for several reasons.

But rather than waiting for courts to decide whether AI training is "fair use", why not be proactive and just start asserting that your creative works are not intended for AI slop? Even if courts rule that copyrighted works not carrying any disclaimer against AI training are free to use for training, this doesn't necessarily imply that all copyrighted works are free to use for training if they explicitly state otherwise.

[1]:https://en.wikipedia.org/wiki/Restitution_and_unjust_enrichm...


But that's the problem, with the way AI lawsuits seem to be going, throwing stuff into AI is allowed regardless of license restrictions. You might as well put "you must pay be $100 every time you use the word 'the' while using this source code" in the license file, the contents don't matter if you can download the source code through legal means.

An anti-AI law might make some big companies wary of including your product for a while, but it'll be scraped up and included in training data the day your source code hits the internet, and I haven't yet seen any evidence that what the AI companies are doing is deemed illegal.


It is pretty much impossible to square the FSF's mission or methods with limiting how people use GPLed software in relation to AI. The FSF should be one of the organisations carrying the banner that people are free to use GPL-ed software as they like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: