Article 10 requires that > all training data be "relevant, representative, free ...

RobotToaster · on May 16, 2023

Will be interesting for copilot, given all the buggy half finished projects on github that would have been included in it's training data.

jstx1 · on May 16, 2023

The quoted sentence is about training data, not about the output of the model, they're different things.

m463 · on May 16, 2023

I wonder what "representative" means in relation to human behavior?

does it mean "must collect ALL data"?

Satam · on May 16, 2023

I have a feeling they might be looking for "equality" with this formulation. However, if it is representative of the real world, it will often not be in line with the norms prescribed by the notion of equality.

jstx1 · on May 16, 2023

Of course not, there's no meaning of representative that requires this.

macksd · on May 16, 2023

I am wondering what qualifies as "complete", though. Any reasonable definition I can come up with is redundant with "representative" and "free of errors".

simion314 · on May 16, 2023

From I read earlier (I did not waste time on this article again) EU rules in the propisal that is not definite are about critical stuff.

I agree that would be idiotic to let some greedy bastards sell some MedicalGPT to us, or PoliceGPT, SurveilenceGPT.

Imagine the MedicaGPT will give you different treatment each time you ask since is not deterministic, or if you change the patient name from Bob to John then it gives you some wild results because test data had tons of hon Smiths in and nobody can explain this AIs reasoning.

So IMO for critical systems we need good rules for safety reasons, for non critical systems we need transparency and if you sell an AI product you should also take responsibility if it performs worse then you advertise. Like you can't SELL me a GPT for schools with a shit disclaimer "it might be wrong sometimes and teach the students wrong stuff, or it might sometimes be NSFW" , IMO fuck this ToS where this giants sell us stuff and take no responsibility on the quality of the product.

NavinF · on May 16, 2023

> different treatment each time you ask since is not deterministic

https://ai.stackexchange.com/questions/32477/what-is-the-tem...

It's unfortunate that EU regulators seem to be making the same mistakes as you because they have a similar understanding of language models.

simion314 · on May 17, 2023

Can you explain where I am wrong? ChatGPT is non deterministic, did OpenAI sabotaged it intentionally ?

I do not this tech banned, but regulated for safety reason in critical systems. I already get daily spam emails from greedy fucks that want to sell me AI for X, where I am 100% this greedy fucks do not understand the science behind this stuff but just want to make money.

NavinF · on May 17, 2023

ChatGPT is intentionally nondeterministic for the same reason that GPT3 is nondeterministic by default. temperature>0 results in a better user experience. I'm having a hard time understanding why you'd think a neural network could unintentionally be nondeterministic. If you want inference to be deterministic, just use the same seed every time.

I also have no idea what your spam emails has to do with training models. The linked law prevents companies in the EU from releasing or deploying large models. It does not prevent grifters from spamming you. (not that there are any companies in the EU training state of the art models, but that's a separate issue)

jdiez17 · on May 16, 2023

Article 10 does not apply to low risk AI systems like ChatGPT.

astrange · on May 16, 2023

This seems very limiting; I can read it as banning adversarial training since that introduces "errors".

nashashmi · on May 16, 2023

Error is different from misinformation.