http://proxy-ip-list.com/download/proxy-list-port-3128.txt http://proxy-ip-list.com/download/free-usa-proxy-ip.txt http://www.proxylists.net/http_highanon.txt http://www.proxylists.net/socks4.txt http://www.proxylists.net/socks5.txt http://www.stopforumspam.com/downloads/listed_ip_90.zip
I've got my own db of hosting facilities which I made by taking 100M urls and doing a lookup on the hostname, then saving the IP found in a db. This gives you some level of confidence that a certain class 'C' is used for hosting.
Google is easy to identify this way, even with a spoofed user agent (which they do a lot now).
But this technique is not possible with EC2 because Amazon refuses to make a public database of what customer is using what.
That's part of their page-cloaking detection code.
http://proxy-ip-list.com/download/proxy-list-port-3128.txt http://proxy-ip-list.com/download/free-usa-proxy-ip.txt http://www.proxylists.net/http_highanon.txt http://www.proxylists.net/socks4.txt http://www.proxylists.net/socks5.txt http://www.stopforumspam.com/downloads/listed_ip_90.zip
I've got my own db of hosting facilities which I made by taking 100M urls and doing a lookup on the hostname, then saving the IP found in a db. This gives you some level of confidence that a certain class 'C' is used for hosting.