Author here. Yep, that's close to my thinking. I don't actually believe that Cursor (or similar tools) are completely shit.
But I worry that the Cursor team perhaps doesn't care whether their product actually delivers value. That they just want to sell the appearance of productivity.
This, to me, is a much bigger concern than everyday performance of their tool. Tools can be improved, organizational culture usually not.
But this is wild speculation. I didn't want to write this as the conclusion of the actual article, which tried to be more factual and to take their marketing at face value.
Probably not a big issue. GDPR compliance can be challenging without a suitable mindset, but it's not impossible.
* Consider that the GDPR has an extremely broad concept of “personal data” – it's not just identifying info but anything that can be reasonably linked to a person!
* Data minimization – only collecting what is needed, and only using it as actually needed – is already a great step.
* Writing a GDPR-compliant privacy notice can be a good exercise to understand what data you're processing for which purposes. Art 12–15 GDPR are the closest it gets to a checklist.
* And you'll have to implement “appropriate” security measures, but what is appropriate is largely up to you.
The more challenging part is ensuring that you're only using data processors/vendors that are contractually bound to use the data as you instruct, and that you protect “international transfers” where the recipient (e.g. vendor) is outside Europe. If you're looking for server locations in North America, I recommend looking at Canada since they have an “adequacy decision” from Europe.
You will have to be GDPR-compliant if you “offer” your service to people who are in Europe, i.e. actively market to such people, or have testimonials from EU customers, offer French localization, accept payment in EUR, and so on. Mere availability of your service is not an offer.
Offering a B2B SaaS service to companies that need to be GDPR-compliant?
You're fucked. There is no legally safe way for a company to use an US-based data processor, i.e. to engage you as a vendor. However, and this is your “get out of jail” card, many customers don't care, and will be happy as long as they can sign “SCCs”.
Identifiability for IP addresses uses an even lower standard. The GDPR says that for something to be truly anonymous, there must not be any “reasonably likely” means for identification, even with the help of third parties, even when relying on additional information. There has of course been litigation about this, in the form of the Breyer v Bundesrepublik Deutschland case. It was based on the GDPR's predecessor law, but it used virtually identical phrasing so the conclusion still holds.
The European Court of Justice constructed a hypothetical scenario to show that identification can reasonably be likely. Let's say the website was attacked by a hacker. In a logfile, you find the attacker's IP address and want to prosecute them. So you report the incident to whatever authority is responsible for such incidents, which then gets a court order so that the attacker's ISP discloses information about the IP address. As long as the ISP knows to whom that IP was allocated at the time, there is now a reasonably likely chain of events that leads to identification of the person behind the IP address.
In this case about Google Fonts, the court says that it's sufficient if the website operator or Google have the “abstract means” for identification, not whether they actually did this for this plaintiff's specific IP address.
A solution would be if the EU forbids ISPs from keeping such logs, but given repeated attempts at mass data retention laws for national security purposes and pressure from the IP industry^W^W film and music industry for copyright infringement prosecution purposes, that doesn't seem likely.
> Do you, as the website operator, have the right to copy and serve these fonts to your visitors?
All the fonts on Google Fonts are open source. When GDPR came into force in 2018 I downloaded all the fonts I needed, checked their licenses, and uploaded them on my servers along with necessary notices as required by the licenses.
The matter could also be sidestepped if the CDN were to offer a GDPR data processing agreement (DPA) and would make guarantees about the locations of servers. The free public CDNs understandably don't do this, and it seems Google Fonts is not covered by the Google Cloud DPA.
The court judgement addresses this exact point. There are previous judgements (Breyer v Bundesrepublik Deutschland) that establish that dynamic IP addresses are personal data. There are reasonable means to identify the data subject with the help of third parties, such as the ISP. “For this it is sufficient that the defendant has the abstract means for identification of the person behind the IP address. Whether the defendant or Google have the concrete means for linking the IP address with the plaintiff is irrelevant.”
That there is correlating information like timestamps, useragent strings, or referer headers increases the likelihood of actual identification, but the mere reasonable possibility of identification is sufficient for IP addresses to be personal data.
You're missing the point (and mischaracterizing the court decisions): An IP address by itself (without any additional information, e.g. the URL of the website requested by that IP address) cannot possibly be personal data, as was illustrated by my "for loop" example.
The present court case and also the one you're referring to (Breuer v. Bundesrepublik Deutschland[0]) do not say anything to the contrary. They were concerned with situations where there is additional data that could be used e.g. to build a user profile. For instance, the Bundlesgerichtshof judgment addressed the question of "whether dynamic IP addresses of website visitors constitute personal data for website operators" [0] (which clearly know which website the visitor visited and therefore possess additional data about the visitor).
For something to be personal data, it must be information that relates to an identifiable natural person. There are two criteria here: (1) it must relate to a natural person, and (2) that person must be identifiable.
Your “loop over IP all addresses”example does not involve personal data because the information doesn't relate to anyone – it is just a list of numbers. Even if it were to relate to individuals, no court would order an ISP to disclose information about corresponding subscribers for such generated IP addresses. Then, the identifiability argument in Breyer cannot work.
In contrast, an IP address that is part of an IP packet received by a server clearly relates to the person sending the packet, if there is such a person. And, with the help of third parties, the person on the other end of the connection is reasonably likely to be identifiable. This does not depend on the website operator having any additional information such as cookie identifiers, other than the date. To avoid confusion, let me quote the relevant part from Breyer:
> 49. Having regard to all the foregoing considerations, the answer to the first question is that [Art 4(1) of the GDPR] must be interpreted as meaning that a dynamic IP address registered […] when a person accesses a website […] constitutes personal data within the meaning of that provision, in relation to that [website] provider, where the latter has the legal means which enable it to identify the data subject with additional data which the internet service provider has about that person.
The only additional data involved here is that held by the ISP, not by the website. That the judgement scopes its conclusion to website providers must be understood not as a limiting factor (as in: IPs can be personal data only for website providers), but as a contrast to the uncontested observation that IPs clearly are personal data for ISPs.
An IP address that relates to an identifiable person is personal data by itself. Thus, its mere disclosure to a third party without a legal basis is a breach of the GDPR. The article you linked highlights the “absolute vs relative” identifiability discussion, but this reasoning holds even under the “relative” standpoint because Google too is a website operator who has the same reasonably likely means for identification as the original website operator, if not substantially better means due to its trove of other data it can correlate with the IP address.
In this LG München case, the court determined that sharing this data with Google was illegal, regardless of whether there is any additional data. It is, in a sense, a very formal argument, that doesn't consider it necessary to dive into specific fact patterns (that's the abstract vs concrete means part quoted in my previous comment). The court did consider the impact of Google's tracking abilities in calculating damages, though.
To summarize my disagreement with your comment: (1) I assert that an IP address by itself can be personal data for a website operator (such as the defendant or Google), per the Breyer argument. (2) The LG München judgement in this Google Fonts case is not concerned about additional data when considering the legality of processing. (3) Additional knowledge held by the website operator is irrelevant for both this case and the Breyer judgement. Since a negative is difficult to prove but a positive can be shown by a single example, could you please point out the paragraphs in the Google Fonts case[1] or the ECJ's Breyer judgement[2] where I'm mistaken for disagreements 2 or 3?
Summary: A company did try the “it was the browser, not us” argument in the “Fashion ID” case. The court did not fall for it. Data controller and thus responsible for compliance is whoever determines the purposes and means of processing. Being able to control what the website does seems to be good evidence for being a data controller.
In this Google Fonts case, the website operator didn't even try this discredited argument.
> The fact is that CDNs and similar third party services play an important role.
They no longer do, since browsers implemented cache isolation.
> if I "host" my fonts in S3 do I have to get consent for sharing IP with Amazon?
No, you're supposed to contractually bind your vendors/service providers as data processors with a contract (“data processing agreement”) per Art 28 GDPR. There's some debate around whether US-based companies are legally able of entering into such an agreement (say hello to the Cloud Act from me), but the general consensus still is that non-US cloud regions might be OK, and that CDNs that let you sign a DPA (like Akamai, Cloudflare, Fastly, …) are also OK. In contrast, Google Fonts does not seem to be covered by the Google Cloud DPA.
> with every router that goes through tracert?
No, such mere transmission doesn't count as processing, and/or the intermediaries are responsible for their own compliance. In any case the connection should be protected by TLS so that only the client IP address + your domain name is visible to intermediate routers.
> websites will add more crap "opt in" CYA forms
Unfortunately, I agree, though the point of this judgement is that self-hosting some assets is a perfectly cromulent alternative. I think relying on “consent” would be difficult in a case like this, since it is not generally possible to make access to a service conditional on consent to unnecessary processing activities. Using a CDN for assets like files is unnecessary.
> I just wish that websites wouldn't force us outside of the EU to the asinine UX required by the EU
For EU-based websites there is no choice, as the law doesn't care about where the users are.
There's also a bit of irony in here that there has been a lot of work in replacing the cursed cookie consent requirements that gave us most of these annoying consent banners – but the past few months revealed that the US tech giants have been successfully lobbying against the proposed ePrivacy Regulation. So please redirect your ire against Google. Without them this might have been fixed in 2018.
This argument was tried in the Fashion ID case. A company had inserted Facebook Like buttons on the web page, and argued that it was not responsible for the ensuing disclosure of personal data (such as IP addresses or possible tracking cookies) to Facebook. See, it was the browser and not the website operator that disclosed the data, and the website operator never had access to the data in the browser in the first place!
The European Court of Justice did not buy this argument. By coding the website in a particular way, the website operator was responsible for causing the user's browser to act in a particular way, so it was the “data controller” for the collection an transmission of personal data by the Facebook Like button, though Facebook is of course jointly responsible for what their code does.
The underlying argument is that someone is a data controller and thus responsible for GDPR compliance when they determine the “purposes and means” of processing, alone or jointly with others. Embedding the code for the button was an exercise of this power to determine purposes and means. In contrast, the website operator is not a data controller for whatever Facebook does with the collected data on its servers, because it cannot control what FB does.
The given case from Munich is a very straightforward extension from the Fashion ID judgement, though the website operator didn't even claim that they weren't responsible. Instead, they argued that they had a “legitimate interest”in loading fonts from Google servers, which the court rejected. While I consider it probable that Google does not use data from Fonts servers for tracking, the judgement correctly points out that Google is well-known for tracking – but this doesn't matter anyway, since already the disclosure of personal data without a legal basis is a problem.
That seems on the surface to be a ridiculous argument.
I can go "bash < somefile" and I can go "csh < somefile" and I can go "cat < somefile". It's my choice to use bash, csh, or cat. somefile will have data in it, that data will be interpreted by MY choice of program to read the data. If I don't want the contents of somefile interpreted as commands I shouldn't be passing it to something that runs commands based on its content. replace somefile with `curl someURL` and nothing changes. If I don't want my computer to connecte to other computers based on what content comes back from `curl someURL` that's my responsibility.
Maybe a better example. It type `npm -i somepackage`. npm then looks in somepackage and sees dependencies and downloads them. By the same logic as the judgement npm or `somepackage` is responsible for leaking PPI based on the dependencies listed. Not the user for running npm in the first place.
The same with `apt update` and `apt upgrade` etc...
The ruling would apply in tons of places that seem like they'd make it hard for things to keep working.
Careful. That is an 100% unofficial site. It is not chartered or funded by the EU. The linked article is from “Richie Koch”an editor working on human rights stories who wrote the article on behalf of Proton VPN, which runs the GDPR.eu site as a content marketing scheme. The linked article is not the law and not official guidance, though it provides a reasonably good summary.
Everything sqrt2 says in the comments is entirely correct, as far as I can tell.
Fair point. And thanks. I think now that my position - while how it should be, consistent with the GDPR and repeated at multiple places - is possibly not in line with a court decision from 2019 or so, that interpreted the e-Privacy Directive in a wrong way imho, and at the very least might depends on local practice of how EU "law" is applied. So you two are probably right.
Ridiculous to govern non-privacy relevant tech usage like this. I still think that's illegal where I live. Regardless, let's hope the e-Privacy Regulation or future court decisions solve this.
JSON lets you write numbers. They can have a sign, decimal part, and an exponent. The standard euphemistically describes this as:
> JSON is agnostic about the semantics of numbers. […] JSON instead offers only the representation of numbers that humans use: a sequence of digits. […] That is enough to allow interchange.
But can you encode/decode an arbitrary integer or a float? Probably not!
* Float values like Infinity or NaN cannot be represented.
* JSON doesn't have separate representation for ints and floats. If an implementation decodes an integer value as a float, this might lose precision.
* JSON doesn't impose any size limits. A JSON number could validly describe a 1000-bit integer, but no reasonable implementation would be able to decode this.
The result is that sane programs – that don't want to be at the mercy of whatever JSON implementation processes the document – might encode large integers as strings. In particular, integers beyond JavaScript's Number.MAX_SAFE_INTEGER (2^53 - 1) should be considered unsafe in a JSON document.
Another result is that no real-world JSON representation can round-trip “correctly”: instead of treating numbers as “a sequence of digits” they might convert them to a float64, in which case a JSON → data model → JSON roundtrip might result in a different document. I would consider that to be a problem due to underspecification.
The numbers was what I was mainly thinking of, so thanks for your exhausting enumeration of those problems.
Jason.org requires white space for empty arrays and objects while RFC 8259 does not (and I often see [] and {} in the wild).
A lot of packages de fact break the spec in other ways, such as ppl blatting python maps out rather than converting them to JSON so that the keys are quoted as ‘foo’ rather than “foo”. I’ve complained about this when trying to parse the stuff only to receive the response “it works for me so you must have a bug” from the pythonistas. This has happened in multiple projects.
But I worry that the Cursor team perhaps doesn't care whether their product actually delivers value. That they just want to sell the appearance of productivity.
This, to me, is a much bigger concern than everyday performance of their tool. Tools can be improved, organizational culture usually not.
But this is wild speculation. I didn't want to write this as the conclusion of the actual article, which tried to be more factual and to take their marketing at face value.