> The IP address "185.199.108.153" is seen while sniffing the network.
Its a github page. I mean you're right, in the case where a domain is using a wildcard cert, and the subdomain is the sensitive part, then esi is the critical leak (assuming the adversary is not sniffing your dns, poisioning your dns, or you are using DoH).
Its a minority case, most sites do not fit into this bucket, but it is a case where you are right.
One cannot get that domain name by looking at the certififcate. The Censys scan for the IP address fails to list it. Passive DNS sources fail to list it. And the name in the reverse DNS (PTR RR) is cdn-185-199-108-153.github.com.
echo -e "GET / HTTP/1.1\r\nHost: about.censys.io\r\nConnection: close\r\n\r\n" \
|openssl s_client -connect 185.199.108.153:443 -tls1_3 > 1.htm
firefox ./1.htm # or whatever the preferred browser. I use a text-only one
I do not use recursive DNS when making HTTP requests for web pages. I fetch DNS data in bulk at selected intervals and store it. The IP address for about.censys.io is loaded into the memory of a localhost forward proxy. The only DNS request before the HTTP request for the about.censys.io page is over the loopback to an authoritative DNS server. It returns the localhost address of the proxy.
Web browsing history or at least a list of domains visited is something that not only a "nefarious" person or "adversary" would find useful. The data has commercial value to companies that are actively trying to destroy user privacy. For example, SNI hands ISP's and intermediary "tech" companies this data on a silver platter. People endlessly debated encrypting DNS. Meanwhile the same people are still sending domain names in the clear, thanks to SNI. The companies behind QUIC do not see that as a problem.
I do not agree that "Its a minority case". For example, if we take all the sites submitted to HN as an example, the sites using CDNs will of course present an ambiguity problem on the scale of the Github case and the ones not using CDNs, IME, do not normally work without hostnames nor do they consistently reveal the correct hostname in the certificate. The certificate will often list several names. As such, I cannot take a list of IP addresses for all those sites and quickly, reliably transform them into the domains submitted to HN.
The convenience of SNI for conducting surveillance of domain names visited is not paralleled by trying to convert IP addresses to domain names.
Which appears to be implemented as a github page unless i am mistaken.
> One cannot get that domain name by looking at the certififcate.
Really, because this is the certificate i get when i view that domain (with correct SNI):
https://crt.sh/?id=6682474444
Its right there in the common name field.
Unless you mean that its not in the default cert for that IP, which sure, but in the scenario where we are evesdropping on SNI, if we couldnt do that, why not just evesdrop on the certificate itself. It should be just as easy.
I'm not sure what your point is here. If this was a foo.github.io page and github had a wildcard cert on *.github.io it would support your position. But about.censys.io is not doing that. This is an example where you need SNI to choose the correct certificate as its being hosted on a cdn hosting many different sites.
> I fetch DNS data in bulk at selected intervals and store it
That's cool and all, but that's not how normal people use dns, nor is it likely that that will become common, so its useless when talking about internet privacy. If doing something totally abnormal is a valid solution, we might as well just say everyone should use tor.
> Web browsing history or at least a list of domains visited is something that not only a "nefarious" person or "adversary" would find useful. The data has commercial value to companies that are actively trying to destroy user privacy
When people say nefarious, that is one of the groups they mean. Heck, this is almost a dictionary definition of nefarious.
> The certificate will often list several names.
It is very rare for the subject-alt-name to be for unrelated sites, all hosted on the same ip, and not require an SNI to select the correct certificate. So rare, that i challenge you to actually find a real example.
Does it indicate the domain name the user intended to visit, about.censys.io, as would SNI.
SNI reveals more than "Its a github page" or a list of sites that are "related", i.e., hosted on the same IP address.^1
SNI provides the network observer with the exact domain name that a user intended to visit. Aside from faking SNI, domain fronting, or similar tactics, the results are reliable.
IP addresses do not indicate the exact domain name that a user intended to visit. Methods used to try to guess the domain name are unreliable.
A list of remote IP addresses observed on the wire is not the same as a list of domain names observed on the wire as servernames (SNI).
The former requires more work than the later and produces ambiguous, unreliable results.
1. A network observer could look for Subject Alternative Names in certificates passing over the wire in cleartext and try to guess which domain name the user intended to visit. However this will not work for TLS1.3 because the certificates will be encrypted.
> If bawolff knew the domain name, why did he not give that as the answer to the hypothetical. Strange.
You misunderstand me.
My point is that there are two cases - the one where the SNI reveals something useful but you also need some sort of SNI because the server serves lots of things and needs to know which you are requesting, and the one you dont need SNI but its trivial to figure out what domain regardless. Your example is an instance of the former. You can't get rid of the SNI in this case because the web server at the other end depends on it to function.
So its a bad example for your purposes because the example neccesarily depends on the SNI existing to exist. If your proposed solution of removing SNI was adopted the site would not work, so removing SNI does not increase privacy in this case as the site would neccesarily have to adjust how it works in a waythat removes any of the privacy gains.
(You could probably say ESNI, but if so, its already in progress albeit slow, so i still dont see your point)
The site does not support TLS1.3, so the certificate retrieved above can be sniffed for SANs on the wire. However as above it does not list all the sites/endpoints.
If we try reverse DNS, we get 3e8.org. Passive DNS data shows 3e8.org and the subdomains www and mail.
One of the other endpoints at this IP is api.call-cc.org.
It is possible to find that name by searching the IP at censys.io.
Despite hosting multiple sites, SNI is not required. Hence I do not send api.call-cc.org in plaintext over the wire.
If the user sent SNI, the network observer has no work to do. She can see the exact domain name the user is accessing.
But without SNI she has to do work. She has to figure out if the user is accessing 3e8.org, www.3e8.org, api.call-cc.org or some other site.
Of course, with some detective work, this is possible. But it is not as easy as sniffing SNI.
By not sending SNI the user is not handing over a comprehensive list of every domain name accessed to ISPs, "tech" company intermediaries or others sniffing network traffic, as she would if using a "modern" browser to send HTTP requests directly to IP addresses.
One of things I like about Cloudflare's ESNI is that it is really fast. I am looking forward to the next iteration of a solution to the SNI privacy leak.
Anyone who thinks the SNI privacy "does not matter", who has no issue with handing over a comprehensive list of every domain name accessed to any party who is sniffing the wire, is encouraged to contact the folks working on encrypting SNI and tell them to stop. :)
> Despite hosting multiple sites, SNI is not required. Hence I do not send api.call-cc.org in plaintext over the wire.
And how do you verify that a man in the middle attack is not in progress if the server is not serving the correct certificate?
I appreciate that active attacks are a bit harder than passive listening, but they are still rather trivial. You are proposing making figuring out the domain name slightly harder in exchange for allowing the entire connection to be eavesdropped.
This seems like an incredibly bad privacy trade off.
The person operating api.call-cc.org is the same person who operates 3e8.org. Using the default certificate, with the SAN 3e8.org, is fine.
For recreational web use, where I am not using a grahical browser and I am in fact sniffing the traffic myself, the tradeoff is acceptable. The probability of someone on path sniffing every domain name sent in the clear over the wire is high, IMO. It is too easy. I would bet on it. Of the sites that do not require SNI, I am using the default certificate. I am not particularly concerned about the person who controls the default certificate at that IP address being able to see the traffic for all the sites hosted at that address. The large CDNs do require SNI. Generally the ones that do not require SNI are at IP addresses that host only a small number of other sites. The only domain names this person can observe are the ones sent to _that IP address_. The person on path sniffing SNI would can see the domain names sent to _every IP address_.
When using the web recreationally, for noncommercial purposes, I cannot see the point of going through the trouble to encrypt the non-confidential contents of web pages and at the same time expending no effort to not send domain names in the clear, and to encrypt SNI where possible. To me, comprehensive recretional browsing history _is_ worth encrypting, perhaps even more than the public web page contents. This is no different than HN users who wish to avoid "smart TVs" that log every program that their owners watch.
And for anyone who believes that this evil person controlling the default certificate may be modifying the contents of web pages, then I can easily compare the contents to the same pages retrieved from Internet Archive or Common Crawl.
Perhaps it is helpful to clarify what I mean when I write "SNI is nor required" or "SNI is required". The language I use may not be the same as that used by web developers or people at large commercial CDNs that are trying to influence "upgrades" of traditional internet protocols that were originally developed by people at universities.
What I mean by "SNI is not required" is that I can send an HTTP request over TLS without SNI and succesfully retrieve the resource I specified in the HTTP method line and Host header, e.g., I send
GET /5/doc/index.html HTTP/1.1
Host: example.com
Connection: close
and I receive index.html.
Whereas if I do not receive index.html unless I also send SNI, then, in the language I use, "SNI is required".
In the 173.230.137.156 example I chose, I can still retrieve /5/doc/index.html from api.call-cc.org regardless of whether I send SNI. Yes, the api.call-cc.org domain name does have its own certificate. It does not matter. I can send no SNI, Ican send SNI "example.com" or I can send SNI "3e8.org" and still retrieve /5/doc/index.html in every case (using the certificate for 3e8.org for the public key, etc., the default certificate sent by the httpd at IP address 173.230.137.156). In this example, the operator of the httpd is not checking the SNI against the Host header.
The reader may wonder about certificate verification on the client side. For example, if a "tech" company-sponsored browser detects that the domain name specified in the Host header does not match a domain name specified in a certificate it may refuse to send the HTTP request. For many years, applications using SSL/TLS often failed to do certificate verification correctly or did not do it at all. Today, browsers sponsored by "tech" companies can do certificate verification. However they are not the only programs that can do it.
I use a localhost forward proxy to do certificate verification not a graphical web browser. This is because most HTTP requests I make are (a) noncommercial and (b) done with commandline utilities that do not support graphics and do not do certificate verification (nor SNI). "Noncommercial" here means things like banking, shopping and so forth. For commercial uses such as those, I use a "modern" graphical browser.
In some ways, but not others, the phrase "SNI is required" is like the phrase "Javascript is required". I routinely make successful HTTP requests to sites whose web developers claim "Javascript is required". To initiate HTTP requests I use commandline utilities that do not support Javascript. Those requests are not over the wire, they are sent to a loopback address. Only the forward proxy sends HTTP requests and receives responses over the wire. To read HTML or consume other media types, I use a text-only browser or some other program. I have no trouble retrieving media from these sites. Neverthless, there are HN commenters who would still argue "Javascript is required".
A phrase can have different meanings to different people.
Perhaps it is helpful to clarify what I mean when I use the phrase "SNI is not required" or "SNI is required". The meaning of those phrases to me may not be the same as the meaning of those phrases to web developers or people at large commercial CDNs that are trying to influence "upgrades" of traditional internet protocols that were originally developed by people at universities.
What I mean by "SNI is not required" is that I can send an HTTP request over TLS without SNI and succesfully retrieve the resource I specified in the HTTP method line and Host header, e.g., if I send
GET /index.html HTTP/1.1
Host: example.com
Connection: close
without SNI and I receive index.html then, to me, "SNI is not required".
Whereas if I do not receive index.html unless I also send SNI, then, to me, "SNI is required". No one should interpret this phrase to indicate I do not understand the purpose behind SNI and why it exists. Nonetheless, I think the phrase is consistently misinterpreted.
In the 173.230.137.156 example I chose, I can still retrieve /5/doc/index.html from api.call-cc.org regardless of whether I send SNI. Yes, the api.call-cc.org FQDN name does have its own certificate. But I can send no SNI, I can send SNI "example.com" or I can send SNI "3e8.org" and in every case I can still retrieve /5/doc/index.html (using the certificate for 3e8.org for the public key, etc., the default certificate sent by the httpd at IP address 173.230.137.156). In this example, the operator of the httpd is not checking the SNI against the Host header.
Thus, to me, "SNI is not required" for this website. It does not matter whether there is one site hosted at the IP address, a handful of sites hosted or thousands of sites hosted. If I can get the media without sending SNI, then I do not send SNI. For anyone on-path who wants to know what sites internet users are visiting, it is none of their business what I have put in the Host header. That is why the Host header is encrypted. It is why TLS1.3 encrypts the certificate. And it is why ESNI/ECH encrypts the SNI as well. Not to mention it is why people _try_ to encrypt DNS. No one needs to know the sites a www user is visiting except the websites themselves.
The reader may wonder about certificate verification on the client side. For example, if a "tech" company-sponsored browser detects that the domain name specified in the Host header does not match a domain name specified in a certificate it may refuse to send the HTTP request. For many years, applications using SSL/TLS often failed to do certificate verification correctly or did not do it at all. Today, browsers sponsored by "tech" companies can do certificate verification. However those are not the only programs that can do it.
I use a localhost forward proxy to do certificate verification, not a web browser. This is because most HTTP requests I make are (a) noncommercial and (b) done with commandline utilities that do not support graphics and do not do certificate verification (nor SNI). "Noncommercial" here means free, recreational activities, and excludes things like banking, shopping and so forth. For commercial uses, I use a "modern" graphical browser.
In some ways, but not others, the phrase "SNI is required" is like the phrase "Javascript is required". I routinely make successful HTTP requests to sites whose web developers claim "Javascript is required". To initiate HTTP requests I use commandline utilities that do not support Javascript. Those requests are not over the wire, they are sent to a loopback address. Only the forward proxy sends HTTP requests and receives responses over the wire. To read HTML or consume other media types, I use a text-only browser or some other program. I have no trouble retrieving media from these sites. Neverthless, there are HN commenters who would still argue "Javascript is required".
A phrase can have different meanings to different people.
Its a github page. I mean you're right, in the case where a domain is using a wildcard cert, and the subdomain is the sensitive part, then esi is the critical leak (assuming the adversary is not sniffing your dns, poisioning your dns, or you are using DoH).
Its a minority case, most sites do not fit into this bucket, but it is a case where you are right.