Imagine you wrote a book, released it using a publisher who put it on dead trees, and sold it in e-book format. And imagine that a whole industry does this, and doesn't release the books for free to copy use in any format. Which is not hard to do, because that's basically the current situation for the publishing industry.
Now imagine that all of that was used to train an LLM without compensation to the authors and publishers who paid the authors. This is apparently current situation with some of the training dataset.
While at the same time, libraries have to pay per e-loan. Archive.org can't do a 1:1 dead tree format shift loan to ebook.
I get that the tech industry wants everyone else's information to be free to use and their products to generate money enough for big exits and big salaries, but at some point the optics look pretty bad.
Now imagine that all of that was used to train an LLM without compensation to the authors and publishers who paid the authors. This is apparently current situation with some of the training dataset.
While at the same time, libraries have to pay per e-loan. Archive.org can't do a 1:1 dead tree format shift loan to ebook.
I get that the tech industry wants everyone else's information to be free to use and their products to generate money enough for big exits and big salaries, but at some point the optics look pretty bad.