For me that discussion is always hard to grasp. When a human would learn coding ...

sensanaty · on Aug 29, 2023

> So why do we care from where LLMs learn?

Because humans aren't computers and the similarities between the two, other than the overuse of the word "learning" in the computer's case, are nonexistant?

gaganyaan · on Aug 29, 2023

Are you really asserting that these models aren't learning? What definition of learning are you using?

sensanaty · on Aug 29, 2023

Don't know if they are, and don't really care either and I don't care to anthropomorphize circuitry to the extent that AI proponents tend to, especially.

Humans and Computers are 2 wholly separate entities, and there's 0 reason for us to conflate the two. I don't care if another human looks at my code and straight up copies/pastes it, I care very much if an entity backed by a megacorp like Micro$oft does the same, en-masse, and sells it for profit, however.

fxnn · on Aug 29, 2023

Okay, so the scale at which they sale their service is a good argument that this is different from a human learning.

However, on the other hand we also have the scale at which they learn, which kind of makes every individual source line of code they learn from pretty unimportant. Learning at this scale is statistical process, and in most cases individual source snippets diminish in the aggregation of millions of others.

Or to put it the other way round, the actual value lies in the effort of collecting the samples, training the models, creating the software required for the whole process, putting everything into a good product and selling it. Again, in my mind, the importance of every individual source repo is too small at this scale to care about their license.

Covenant0028 · on Aug 30, 2023

The idea that individual source snippets at this scale diminish in aggregation, is undercut by the fact that OpenAI and MSFT are both selling enterprise-flavoured versions of GPT, and the one thing they promise is that enterprise data will not be used to further train GPT.

That is a fear for companies because the individual source snippets and the knowledge "learned" from them is seen as a competitive advantage of which the sources are an integral part - and I think this is a fair point from their side. However then the exact same argument should apply in favour of paying the artists, writers, coders etc whose work has been used to train these models.

So it sounds like they are trying to have their cake and eat it too.

fxnn · on Sept 6, 2023

Hmm. You sure this is the same thing? I would say it’s more about confidentiality than about value.

Because what companies want to hide are usually secrets, that are available to (nearly) no one outside of the company. It’s about preventing accidental disclosure.

What AIs are trained on, on the other hand, is publicly available data.

To be clear: what could leak accidentally would have value of course. But here it’s about the single important fact that gets public although it shouldn’t, vs. the billions of pieces from which the trained AI emerges.

gaganyaan · on Aug 29, 2023

It's really not different in scale. Imagine for a moment how much storage space it would take to store the sensory data that any two year old has experienced. That would absolutely dwarf the text-based world the largest of LLMs have experienced.

gaganyaan · on Aug 29, 2023

If you don't care, why are you confidently asserting things you're not even interested in examining? It just drowns out useful comments.

bombolo · on Aug 29, 2023

Do humans really read terabytes of C code to learn C?

Humans look at a few examples and extrapolate…

fxnn · on Sept 6, 2023

But that also exists in the AI world. It’s called „fine tuning“: a LLM trained on a big general dataset can learn special knowledge with little effort.

I’d guess it’s exactly the same with humans: a human that received good general education can quickly learn specific things like C.

gaganyaan · on Aug 29, 2023

Humans have experienced an amount of data that absolutely dwarfs the amount of data even the largest of LLMs have seen. And they've got billions of years of evolution to build on to boot

bombolo · on Aug 29, 2023

You're straying away. Let's talk about learning C.

Also humans didn't evolve in billion of years.

gaganyaan · on Aug 29, 2023

The process of evolution "from scratch", i.e. from single-celled organisms took billions of years.

This is all relevant because humans aren't born as random chemical soup. We come with pre-trained weights from billions of years of evolution, and fine-tune that with enormous amounts of sensory data for years. Only after that incredibly complex and time-consuming process does a person have the ability to learn from a few examples.

An LLM can generalize from a few examples on a new language that you invent yourself and isn't in the training set. Go ahead and try it.

bombolo · on Sept 4, 2023

I can't even convince it to put the parameters in a function call in the correct order, despite repeatedly asking.

DrScientist · on Aug 29, 2023

There is the element of the unknown with LLMs etc.

There is a legal difference between learning from something and truly making your own version and simply copying.

It's vague of course - take plagiarism in a university science essay - the student has no original data and very likely no original thought - but still there is a difference between simply copying a textbook and writing it in your own words.

Bottom line - how do we know the output of the LLM isn't a verbatim copy of something with the license stripped off?

peoplefromibiza · on Aug 29, 2023

> So why do we care from where LLMs learn?

same difference there is between painting your own fake Caravaggio and buying a fake Caravaggio (or selling the one you made).

the second one is forgery, the first one is not.

blackhaz · on Aug 29, 2023

The way I see it is that with AI you have really painted your own Caravaggio, but instead of an electrochemical circuit of a human brain you've employed a virtual network.

peoplefromibiza · on Aug 29, 2023

> but instead of an electrochemical circuit of a human brain you've employed a virtual network.

technically it is still a tool you are using, differently from doing it on your own, with your hands, using your own brain cells, that you trained over the decades, instead of using a virtual electronic brain pre-trained in hours/days by someone else on who knows what.

fxnn · on Aug 29, 2023

Okay if it’s about looking at one painting and fake that. However, if you train your model on billions of paintings and create arbitrary new ones from that, it’s just a statistical analysis on what paintings in general are made of.

The importance of the individual painting diminishes at this scale.

joncrocks · on Aug 29, 2023

And if you look at lots of paintings, and create a new painting which is in a very similar style to an existing painting?

Is that a forgery? Have you infringed on the copyright on all the paintings you looked at?

jprete · on Aug 29, 2023

Why do people bring this up? People are not LLMs and the issues are not the same.

tracker1 · on Aug 29, 2023

I'd add to this, the damage an LLM could do is much less than a human could do in terms of individual production. A person can paint so many forgeries... A machine can create many, many more. The dilusion of value from a person learning is far different than machine learning. The value extracted and diluted is night and day in terms of scale.

Not to say what will/won't happen. In practice, what I've seen doesn't scare me much in terms of what LLMs produce vs. what a person has to clean up after it's produced.

gaganyaan · on Aug 29, 2023

Why are the issues not the same? Are you privileging meat over silicon?

ddingus · on Aug 29, 2023

Yes they are. Most people will.

They are not the same because an LLM is a construct. It is not a living entity with agency, motive, and all the things the law was intended for.

We will see new law as this tech develops.

For an analogy, many people call infringement theft and they are wrong to do so.

They will focus on the someone getting something without having followed the right process part while ignoring the equally important someone else being denied the use of, or loss of property part.

The former is an element in common between theft and infringement. And it is compelling!

But, the real meat in theft is all about people losing property! And that is not common at all.

This AI thing is similar. The common elements are super compelling.

But it just won't be about that in the end. It will be all about the details unique to AI code.

gaganyaan · on Aug 29, 2023

Using the word "construct" isn't adding anything to the conversation. If we bioengineer a sentient human, would you feel OK torturing it because it's "just a construct"? If that's unethical to you, how about half meat and half silicon? How much silicon is too much silicon and makes torture OK?

> Most people will [privilege meat]

"A person is smart. People are dumb, panicky dangerous animals, and you know it". I agree that humans are likely to pass bad laws, because we are mostly just dumb, panicky dangerous animals in the end. That's different than asking an internet commentor why they're being so confident in their opinions though.

ddingus · on Aug 29, 2023

If we bioengineer:

Full stop. We've not done that yet. When we do, we can revisit the law / discussion.

We can remedy "construct" this way:

Your engineered human would be a being. Being a being is one primary difference between us and these LLM things we are toying with right now.

And yes, beings are absolutely going to value themselves over non beings. It makes perfect sense to do so.

These LLM entities are not beings. That's fundamental. And it's why an extremely large number of other beings are going to find your comment laughable. I did!

You are attempting to simplify things too much to be meaningful.

gaganyaan · on Aug 29, 2023

Define "being". If it's so fundamental, it should be pretty easy, no?

And I'd like if this were simple. Unfortunately there's too many people throwing around over-simplifications like "They are not the same because an LLM is a construct" or "These LLM entities are not beings". If you'll excuse the comparison, it's like arguing with theists that can't reason about their ideological foundations, but can provide specious soundbites in spades.

ddingus · on Aug 29, 2023

It is easy!!

First and foremost:

A being is a living thing with a will to survive, need for food, and a corporeal existence, in other words, is born, lives for a time, then dies.

Secondly, beings are unique. Each one has a state that ends when they do and begins when they do. So far, we are unable to copy this state. Maybe we will one day, but that day, should there ever be one, is far away. We will live our lives never seeing this come to pass.

Finally, beings have agency. They do not require prompting.

gaganyaan · on Aug 29, 2023

So these jellyfish aren't "beings" because they can live forever? Or do they magically become "beings" when they die?

https://en.m.wikipedia.org/wiki/Turritopsis_dohrnii

Also twice now you've said the equivalent of "it hasn't happened yet so no need to think about the implications". Respectfully, I think you need to ponder your arguments a bit more carefully. Cheers.

ddingus · on Aug 30, 2023

Of course they are beings!

They've got a few fantastic attributes, lots of different beings do. You know the little water bear things are tough as nails! You can freeze them for for a century wake them up and they'll crawl around like nothing happened.

Naked mole rats don't get any form of cancer. All kinds of things the beans present in the world that doesn't affect the definition at all.

You didn't gain any ground with that.

And I will point out, it is you who has the burden in this whole conversation. I am clearly in the majority if you want things with what I've said. And I will absolutely privilege meets face over silicon any day, for the reasons I've given.

You, on the other hand, have a hell of a sales job ahead of you. Good luck maybe this little exchange helped a bit take care

peoplefromibiza · on Aug 31, 2023

> Or do they magically become "beings" when they die?

quoting from your link

although in practice individuals can still die. In nature, most Turritopsis dohrnii are likely to succumb to predation or disease in the medusa stage without reverting to the polyp form

This sentence does not apply to an LLM.

Also, you can copy an LLM state and training data and you will have an equivalent LLM, you can't copy the state of a living being.

Mostly because a big chunk of the state is experience, like for example you take that jellyfish, cut one of its tentacles and it will be scarred for life (immortal or not). That can't be copied and most likely never will.

ddingus · on Aug 30, 2023

Regarding the copying of a being state, I'm not really sure that's ever even going to be possible.

So for the sake of argument I'll just amend that and say we can't copy their state. Each being is unique and that's it. They aren't something we copy.

And yes that means all of us that thinks somehow they're going to get downloaded into a computer? I'll say it right here and now that's not going to fucking happen.

red_trumpet · on Aug 29, 2023

Companies don't go around donating their source code to universities either, even if it was only for the purpose of learning.

blitzar · on Aug 29, 2023

> So why do we care from where LLMs learn?

Because humans dont put the "Shutterstock" watermark logo on the images they produce.

defrost · on Aug 29, 2023

As with all absolutes* exceptions exist:

Viagra Boys - In Spite Of Ourselves (with Amy Taylor)

    I absolutely love that the entirety of the video is unpurchased stock footage with the watermark still on it. This is cinematic gold.

https://www.youtube.com/watch?v=WLl1qpDL7YA

* well, most ...

hnben · on Aug 29, 2023

cargo cult programming is real though