FWIW I’ve seen comments from a privacy engineer at google who posted here and said they actually do work hard to delete your data.
I’d expect Facebook and google to successfully delete the data (after giving 3-6 months for backups to age out) but wouldn’t trust most smaller operations to do so. And yeah, that doesn’t mean ML models or whatever but just the retrievable copies of your photos and text posts.
After knowing Google have been tracking few hundreds millions people's precise locations for the last 10+ years, I went thru the exercise of deleting all the activities history from Map, Youtube and Search and disable all activities tracking.
For the first few hours, I see new default youtube suggestions. After a few days, I see a lot of old search/view videos pop back to the youtube home page. YT still seems "suggested" videos for me to watch base on the past viewing info.
The model is probably based off of your watch history before you deleted everything. There would be no way (theoretically) for anyone to see your watch history but YouTube still thinks those old videos are ones that you'd be interested in and fortunately you haven't watched them yet (as far as the algorithm is concerned) so it's recommending them to you again.
It's likely they have an ML "recommender" model that was built off of your history, but doesn't actually store the history nor is it reverse engineer-able. Once you hit a certain threshold of activity or time duration, the model will probably rebuild.
This is a big reason I view the new “we’ll automatically delete your data!” Crap as disingenuous at best. They’ve already learned everything from the data in a few hours, it’s useless after that anyhow.
A big problem with data is not the company itself using it, but other ways it can come back to bite you.
E.g. a sexual photo retrieved from a private social media album is no threat to your reputation when it’s been assumed into some machine learning model for detecting sexual photos, but it’s certainly a threat if a future data leak allows your enemies to get the actual .PNG or .JPG and send it to the news media or your loved ones. Knowing that the photo can actually be deleted is valuable in this case and I’m sure there are many other similar ones that could be listed.
Users want to delete watch history, which they can. Classifier models predicting what you want to watch are not the video watch history, nor are the models capable of producing it.
> Classifier models predicting what you want to watch
But that probably is not the only model generated from your data, is it? They probably have many other models generated from your data, everything from ad-displaying models to profiling models for Hydra.
That's what I'm talking about. Think of the recommender model as your own personal neural net that is trained, over time, to show you videos it thinks you might like. It is not capable of telling you which videos you watched, because it is not a database or list of watched videos, but it is instead a classifier that predicts what you probably want to watch.
That's a far, far cry from the reality of "possibly prosecutable in EU in some scenarios." We aren't even using a website right now that's in GDPR jurisdiction. But there's also a weird fetishization of GDPR I've noticed where people invoke it like "heh, my dad works for Xbox. just you wait buddy, he'll have you banned."
I think it’s changed what is best practice. Unless you plan to never introduce your product in the EU, designing your product so that it cannot ever delete data is a bad idea.
In my experience I’ve definitely seen GDPR result in a large company having developers looking at how data can actually be deleted and not just set to deleted=true. I don’t think my company’s lawyers were alone in thinking this suddenly became more important than before.
When a feature (e.g. thoroughly scrubbed data deletion procedure) is introduced into a product, it is often far easier to apply it laterally to all customers than just a subset. For this reason, GDPR has knock-on effects that benefit all users of certain services.
FWIW I’ve seen comments from a privacy engineer at google who posted here and said they actually do work hard to delete your data.
I’d expect Facebook and google to successfully delete the data (after giving 3-6 months for backups to age out) but wouldn’t trust most smaller operations to do so. And yeah, that doesn’t mean ML models or whatever but just the retrievable copies of your photos and text posts.