Hacker Newsnew | past | comments | ask | show | jobs | submit | jwr's commentslogin

In Central Europe (Poland), too.

Nobody, and I do mean nobody, realizes that using Whatsapp by default (which everyone accepts) synchronizes their entire contacts list to Meta. It's a golden trove of valuable data for an ad targeting company.

People don't realize that there is so much that can be inferred about you from your contacts. Whether you have kids, which schools they go to, and similar personal information.

All of this is not even on the radar and people reduce privacy to "whether someone listens in on my calls or conversations" and tend to brush it off, because they honestly don't care about that part.

Signal doesn't make it easier by refusing to allow encrypted iCloud backups for so many years (which means people lose data when they lose their phones!) and recently introducing a subscription backup service instead of allowing me to do an encrypted iCloud backup. It's hard to explain to people that they should use an inferior product just because of "privacy".


I've seen more interest in Telegram among my contacts in recent years and less of it in Signal. I managed to grab my partner, sister and my friend to use Signal but they all three still use facebook along with whatsapp and doesn't seem they want to leave it.

The recent yet another revival of Gadu Gadu pass by without much fanfare. For those unfamiliar: GG was created by single guy upon ICQ idea of UIN's and quickly become the default messenger in Poland some 20 years ago - even gov't used it at some point. But it lost its position to Skype, Whatsapp and Viber. I admire somewhat the dedication this new company has in restoring our "homemade" network. But it's way too late: new generation of people who's mainly familiar with big corporations services have grown up and GG has nothing that'd made it replace these.


The 30% figure seems to be completely made up and this whole article is adfarm-bait.

The CABA trial is an 8-week single-arm pilot (no placebo). The study measured cognitive improvement over 8 weeks in a single group — not "slowing of decline versus placebo." There is no 30% figure anywhere in the paper.

I'm glad we have AI to quickly read this kind of stuff and check these kinds of claims for us.


> I don't want to defend them, because they gate away a good chunk of the internet with their "bot protection"

They also gate away a good many people with their "bot protection". I am extremely worried about how so many seem to have outsourced the control over who can access their websites to a company, with no second thoughts whatsoever.


The problem is what is the alternative? I'm (not) defending them or this practice by any measure, but we all know what happens if you just open your site up without these, especially with AI bots which hammer servers and are in effect a legalized DDoS system. I've hated CAPTCHAs ever since I first encountered them and I can't wait for them to just finally die a permanent death, but I also don't know how we solve the "how do you identify a human and a bot" in a way which doesn't require server admins to have extremely beefy servers or similar setups to handle the extra load. I'm not going to do the "there HAS to be a way thing" either because, for all I know, this could just be one of those impossible-to-solve problems.

> we all know what happens if you just open your site up without these, especially with AI bots which hammer servers and are in effect a legalized DDoS system

No, we don't know. I honestly do not understand the problem. I run websites, both static and non-static. Granted, my sites aren't exactly the most popular internet go-to destinations, but I should be seeing this DDoS too, right?

I do see lots of requests. Nothing that any modern system can't handle. Computers are stupid fast these days. Unless you are doing something unreasonable, it's really hard to even notice this "extra load".

I understand there are sites for whom this causes problems, but I think these are rare and could be optimized not to do unreasonable things.

I think too many people are annoyed by AI companies (arguably understandable position), look at their logs and speak of "hammering", "DDoS" and "extra load", while in reality it doesn't matter much.


We do know, just ask anyone who runs a more popular site or does anything where abuse can be monetized (shopping, reviews, etc.). Avoiding that due to obscurity isn’t an answer because it’s saying you’re safe until something, possibly outside of your control, causes the bots to descend and give you an extra 500M requests with no chance of revenue.

I’m with OP: I don’t like this but the alternatives all look like the death of the open web.


> just ask anyone who runs a more popular site

The person you're responding to already said they ran a modestly sized site. What actual scale opens one up to abuse? If only the top 1% of sites need it, then it seems silly to say "everyone" needs it.


It’s not just scale. Do you accept user generated content? If so, more of a target.

Stack Overflow was outside of the Cloudflare network for years, and anti-abuse was maybe 3 or 4 full-time jobs – much of which still needs to be done, because Cloudflare's anti-bot protection hasn't actually stopped it. Most UGC sites are not as big as Stack Overflow was at its peak.

Most UGC sites also don't have a horde of volunteer mods voting to close/delete things.

I'm referring specifically to the activities of Charcoal (https://charcoal-se.org/) and their Stack Exchange staff counterparts, taken together. This is about large-scale platform abuse, of the sort that Cloudflare is alleged to prevent (but doesn't, really), not the more mundane (and laborious) task of manual quality control.

errr... so anything related to UGC now has a lower bound of 3-4 FTE? Sure, I'll hire a team of content moderators next time I think about putting a comment form under my blog...

Yes? Cloudflare doesn't replace moderators. At all. It only allegedly filters bot generated content, it doesn't filter user generated content and doesn't even intend to.

Please read their last sentence again and think about how much it understates the difference between stack overflow in its prime and a normal website. Also the "much of which still needs to be done".

So everyone is paying cloudflare… why?

Because charging for bandwidth/traffic is still a thing, unfortunately

Because paying with MITM is far less visible than paying with money

Most likely not. Their free tier is fairly generous.

It might depend on the tech stack. I run a small niche website but it has PHP and a database (MediaWiki/PHPBB) and without Cloudflare I'd estimate I'd need to spend several hundred dollars a month to handle the traffic. Traffic used to be tens of thousands of requests a day. AI has increased that to between 400k and 3M requests per day but it's not a smooth distribution. This is with bot fight mode on that greatly reduces traffic.

I adopted Cloudflare because it was getting DDoSed by the AI crawlers. I'm pretty sure all of them are vibe coding their crawlers and don't bother adding rate limiting as a requirement.


That was my point. I was trying to be gentle by mentioning "unreasonable" things, but seriously — how did we get to the point where less than 6 requests per second (that's 500k requests per day) is considered a DDoS?

I've spent some effort on optimizing my sites, but most of the effort was focused on avoiding unreasonable (stupid) work. Do I need a session for every request? No, I don't! Do I need a database fetch for every access to my homepage? No, I don't! Is it a problem to actually load all of my static content in all supported languages (24) into memory and serve it from memory? No, it isn't!

I use Clojure behind nginx on the server for my sites. Oh, and I also pre-compress all static assets to Brotli, so anything that handles brotli gets a static file served directly from nginx. I also use immutable assets with unlimited caching semantics.

Really — the problem is that we've grown lax and our software has become bloated, slow, and with unreasonable code paths. If every page fetch does 12 database accesses and runs through a slow interpreter, that is surely going to be a problem.


I second this. My website exposes a cgit and 99% of the traffic now is AI scraping the sources, but the load is nowhere near DoS territory. And this is running on the cheapest VPS I could find.

Not saying I'm not annoyed by the scraping; I am looking to block them, but I'm also not going to put the site behind the gatekeeper. If anything, Cloudflare must love AI scraping now for the same reason AV companies love malware.

Now, if you are running a PHP stack...yeah, maybe that's the problem right there.


> 99% of the traffic now is AI scraping the sources

I wonder if we should stop fighting this and instead create an API specifically for this purpose? Or, a central repository that you could send your data to and say to anyone wanting to scrape, "safe yourself some time and just get my data from this other place"


Is there actually any plausible theory why "AI" would repeatedly scrape the same sites? Are there that many competing, completely independent AI labs? Is it cheaper to repeatedly scrape than to buffer the scraped data locally? (I find it very hard to imagine that it's easier to deal with changing/disappearing content than it is to stand up such a cache.)

If you ask an agent to check sources / function definitions of open source packages it will wget / curl it

It's an AI generated scraper that scrapes nonstop.

The PHP stack isn't even the problem, it's having unauthenticated requests getting past the cache in the first place, something that most sites should be able to prevent.

Consider yourself lucky. But don't let yourself fall into the trap of thinking it's a nonissue for everyone else until it happens to you.

People shouldn't have to be experts or provision a larger server to run a UGC service that can withstand the sort of 30x more traffic I'm seeing from AI bots. Or rather, you didn't render the argument for why they should have to do that if they can just use CloudFlare's free tier.

Either way, it's easy to have all the answers when you've never had the problem.


Has anyone pointed an AI scraper at your server at all? Unless your website appears in search engine listings I don't think the AI scrapers will slam it. My server has never been hit by them but my server is also practically unknown. All of this said, I'm not going to claim that server loads can handle it because many sysadmins have claimed otherwise, and I would like to think that their claims are reliable.

As soon as you get your TLS certificate you get bombarded with scraping. You don't need someone to "point a scraper at you".

What matters most is usually how much there is to scrape. If you have like 5 pages that's nothing. For forum like websites where each thread, each user profile, etc. gets scraped that's when traffic increases. I just let them have at it with no issues though, computers are fast.


That's really weird. My experience is quite different: I have several subdomains and all of them have TLS certs and I haven't (yet) seen this (thankfully). Either that, or my server is masking it. The weird thing is that my server is an OVH dedicated box that doesn't exactly have top-tier specs, so I have no idea what's going on there. Very weird indeed.

Probably you don't have much to scrape?

I mean... It may be that most of the things I run aren't really scrape-able. I run Matrix (which requires authentication), an XWiki instance, Zulip, Terraria, Forgejo, Nextcloud, a Mastodon server... Most of those require auth behind my Kanidm instance to actually do anything. Well and most of them have APIs that are much better than "scrape the universe".

If you run the site on a custom port, scrapers won't find it?

Also, how do we even know they're really "AI scrapers", or just a deliberate DDoS to push sites into using CF or other "anti-bot" providers?

They showed up when the AI money did. The evidence is circumstantial, but… some of them are remarkably well engineered (from a “how difficult is it to identify this traffic” perspective, in a way that never existed before (I have been running a quite sizeable site for 8 years, over 200k registered users, and you don’t need to register to use 99% of it).

I run a quite large website and there are a few patterns.

The usage is extremely quick, and follows easy-to-spot patterns. We noticed a spike in bounce rate.

They never come from Google, and the bad programmed ones just crawl several pages at a time, faster than a user could do.

Then there's the crazy spikes in visits from specific countries, pretty much scraping the entire content. Often from pools of IPs. In some cases had 30% unexplained (meaning: it wasn't viral or a marketing campaign) random sustained increases in traffic.

There's also the fact they don't interact with the complicated widgets, so zero XHR requests other than analytics pings.

They also don't cause spikes in Google Analytics, so I assume it's blocked, but they show up in logs and in the internal analytics.

It's not enough to DDOS the website at all, but it's a lot of noise in statistics that we gotta learn to filter.


> They never come from Google, and the bad programmed ones just crawl several pages at a time, faster than a user could do.

I’ve triggered this kind of “bot protection” right here on Hacker News many times. I did that by having a bunch of Hacker News pages open and then closing and reopening my browser. I’ve also triggered it by opening a bunch of links in the background too quickly. I’ve also triggered it by reading the article, then clicking back and upvoting/favouriting too quickly. I’m also located in Singapore, which people have started to advocate for blocking here recently.

A single non-bot legitimate user can easily trigger these kinds of heuristics just by using the site in a way you don’t expect. This can affect some users disproportionately more than others, e.g. disabled people who need to use assistive technology.


Oh I also do this all the time.

What I mean by "too fast" is opening 50 pages in the span of two or three milliseconds.

Either way, I'm not blocking. The CDN is handling the traffic alright.


I hate that sort of thing - when I rolled my own proof-of-work bot protection (providers wanted $$$$), I set it up so that

A) you'd have to open >200 tabs, and B) if any tab solves the proof-of-work, any that are still waiting to do so reload in the background.


Yes, circumstantial is exactly the point; it's easy to use AI as a scapegoat because it's something popular to hate on.

It's circumstantial evidence, but Occam's Razor also applies.

It's not a hostile DOS in the traditional sense (I've mitigated a few of those) - no "pay us to make it stop", no pattern to the requests other than "fetch every unique URL a few times".

It wasn't happening until financial incentives to gather large datasets for AI training appeared.

Bad actors (using residential proxies & claiming to be a real browser) mostly showed up after folk started blocking ones that identified themselves as AI scrapers.

It's obvious to blame AI training because there's a shortage of better explanations. Who else would be paying for these (expensive) residential botnets, only to use them to (eg) web-scrape wikipedia (which offers free downloads of its content in a structured format)?

The simplest explanation of the technical behavior is "a bot coded to follow every link it sees & save the results", and the simplest explanation of the motive to run such a bot is "to train a large language model".


no "pay us to make it stop"

"use Cloudflare to make it stop"


Or fastly, or akamai, or bunny, or any number of other providers.

Cloudflare are merely the cheapest of the bunch.


Exactly. They (and most of all, Big G) stand to profit greatly from this browser discrimination. What better than to make more sites use them by launching DDoS attacks in the name of "AI scraping".

A small, single EU country focused non-static e-commerce, with proper robots.txt instructions that worked perfectly well in the search & co bots -only "era" with rate limiting for nginx/php-fpm setup - is kinda struggling without CF to handle 15000 requests per 15 minutes, coming from Chrome "users" from IPv6. Best so far was an avg. server load in htop = 40 on an 8-core server x_x

That's 16.6rps. A single guy holding the F5 key on chrome can generate that much traffic and take down your website. That kind of performance was never acceptable.

People will always reframe their request numbers to avoid stating their pitiful requests per second numbers, it's hilarious. "This thing is handling hundreds of thousands of requests per day!" Like cool, you're barely making it double digit requests per second.

> handle 15000 requests per 15 minutes,

that's just ~17 req/sec

That's "cheap VPS running wordpress" level of traffic


Maybe a plain WordPress install. Run something like WooCommerce and install a bunch of plugins to get the functionality that WordPress and WooCommerce should have built-in, and suddenly a cheap VPS can only handle 2 or 3 requests per second.

It's phenomenal how inefficient the WordPress/WooCommerce stack is.

Though the main issue I'm seeing is credit card testing, not scraping.

And I'm ideologically opposed to using a CDN (because it shouldn't be needed for such a small site!) so it's somewhat a self-inflicted problem...


"Security" plugins are also HUGE problem here, most of them turns "few cached DB SELECTs" (or static file read if you use caching plugin) into now a bunch of inserts, just to log/analyze "offender" IP and maybe block it, in many cases turning "blocking offender" to be more costly that would be serving the page without the security plugin

You can calculate traffic stats for a day by IPs/subnets and probably bots will stand out. If they are using IPv6 you can figure out the ASN and block it completely.

Block out IPv6 and see if that helps.

Why not block all odd v4 addresses while you're at it? I heard that that can reduce scraping volume by 50%!

That's harder to set up, and also unfair to people who have an odd IP address.

It's easier and better to just block 0.0.0.0/1 half of the time, and 128.0.0.0/1 for the other half of the time. Switch every day at noon.

Bot traffic will be cut by 50%, and humans are all treated equally! It's a total win!


And blocking ipv6 addresses isn't unfair to people who have an ipv6 address?

Yeah, I suppose you're right.

Just block it all.


Blocking Singapore reduces the AI load 90%.

You get downvoted for these opinions but I agree. Most people that complain that their servers get hammered by AI bots are those that run very unoptimized servers that can only handle like 100 rps. I've never had any issues with any of my moderately optimized websites. A $10 VPS can handle sooo much traffic.

I think people get annoyed when it's suggested they spend time optimising or even re-writing their websites to handle high traffic loads just to cater to AI bots ripping their content.

It's also not always easy to do. I run a small wiki which is fairly optimised, nearly every page manages at least ~3k rps on a small VPS. The only exception is the diff page which is ~150 rps. Optimising that while still giving good output isn't that easy, but the wiki doesn't have many users so that would be fine if it wasn't for the AI bots.

The AI bots ignore robots.txt and were initially hitting the site with ~1k rps crawling every combination. Even that would be manageable as there's currently ~150,000 combinations, except they kept re-crawling the whole lot each day. The server could manage it but it was a massive waste of resources.

They were using residential IPs and only sending 1 request from each IP making it impossible to block. In the end I gave up and put a Cloudflare challenge in front of it. I don't want to use Cloudflare but the alternative is forcing users to login to view diffs or remove them entirely.


What I do is have more strict rate limits for non logged in users. You tell them to log in if they hit the rate limit. For non logged in users, you have a rate limit not just for IP, but also for /24 and /16. Forget about IPv6, IPv4 scarcity is a feature not a bug.

The bot I had was using unique IPs for each request. Some were from cloud providers but most were just random residential ISPs. I couldn't see any obvious connections so rate limiting would've had to be a global rate limit.

Similar to the one SQLite had: https://www2.sqlite.org/forum/forumpost/7d3eb059f81ff694?t=h

Each IP only makes ~1 request though so easy to detect after the fact.

I guess they will run out of IPs at some point so maybe if I had logged each one forever and shown a challenge only to them, it would have fixed it eventually. Just depends how big their pool of IPs is.


You were getting 1k rps, and each request was from an unique IP? So after an hour you got hit by 3.6M different IPs? And all from uncorrelated /16s? That seems hard to believe. Not that I don't believe you, it's just hard for me to grasp that whoever was scraping you had such a large and distributed swarm.

This is called rotating residential proxy service. You can buy it off grey market sites that are probably getting it from botnet operators. It costs about $2-$5 per GB.

Interesting, that definitely seems to be it.

There really isn't a good reason for a wiki (or git host) to provide diffs between arbitrary revisions to unauthenticated users. Limit it to diffs compared to previous (which can be cached) and this problem goes away.

In any case, such labyrinths of expensive dynamically generated pages are no excuse for subjecting people requesting the start page to bot checks.


I see many mediawiki wikis (like the Arch Linux wiki) using anubis succsefully. It can be configured to only act on certain paths.

Curious, but how do the bots figure out the combinations? Or do you have links to the diffs from other sites? I assume the diff takes two files in query parameters or something.

I'm not 100% sure but I think links. There's a bunch on the history and revision pages. Yeah, the diff URL has two revision ID's as parameters.

I did try removing some of the links without success. I guess once they have them they just keep checking.


I managed to solve my scraper problems without optimizing much, but if I had to optimize I think the only option might be "don't use mediawiki" and that's an extremely obnoxious solution. Though maybe I could get there by throttling specific kinds of pages.

Same. Tritium and the blog have done stents on the front page here and high traffic subreddits and that plus bots has never been a problem. UX could be improved through a CDN but even that isn’t worth the trade-off for us at the moment.

If you're in any way semi-popular and a decent size, you're gonna get hammered. PortableApps.com was partially offline for weeks due to China-based AI scrapers. You block the useragent, they start hitting you with another one from the same IP in the same way. You block the IP, they switch to another. You block the subnet, they use another. At one point it was nearly a thousand different IPs from around China hammering away. For all intents and purposes, a DDoS. This wasn't a little "extra load", this was load that was thousands of times beyond what our legitimate userbase was using.

And if you're thinking about blocking all of China, while this particular AI bot didn't use them, a bunch of other ones I've encountered use VPNs and hacked clients worldwide.


> I understand there are sites for whom this causes problems, but I think these are rare and could be optimized not to do unreasonable things.

There are. They're not. They can't (without significant effort)


I don't think it's just privacy, it also increasingly turns the web itself into a walled garden. The end result is that websites can only ever be accessed by "approved" clients - the latest Chrome, Edge, Safari and if you're lucky Firefox - and nothing else.

> and if you're lucky Firefox

I haven't had any problems with Firefox so far. Why do you say this?


That was more a (gloomy) outlook into the future, given Chrome's market dominance and tendency for unilateral actions in web standards.

I haven't ever noticed Cloudflare having any issues on Firefox, so presumably that implies any unilateral actions in web standards have been worked around by CF to provide the service to Firefox as well.

I'm pretty frequently blocked by Cloudflare when I use Firefox on OpenBSD -- apparently it's too suspicious of a combination for their liking, or something. Even on Linux I've occasionally had issues. I've had to email site operators to ask them to change their configuration so I can actually be a customer of their business.

It's already a problem with Firefox + some essential web condom extensions.

I think there's some chance we get a "proof of purchase" system where there is some entity that takes a $10 payment to give out a unique identity token that you need to present to visit most sites. if you have a revocation process for ones used for bad actors, it seems like it would work pretty well.

That's called an IP address. You pay your ISP $50+ every month to get one. Has it worked so far?

If the bad guys also had to pay $50/month/IP it would probably work.

The bad guys don't pay that much. And sometimes the bad guys actually use the IPs of other people (botnets on residential IPs) and don't pay anything at all.


They pay something. You can get a few ten cents per gigabyte for a voluntary proxy right now. I've never tried it long enough to get a minimum payout, so could be a scam for all I know (or maybe the minimum payout is the scam).

What would stop you offering someone a few tens of cents per GB to borrow any other token barrier you put up?


Except if your country is under sanctions.

> we all know what happens if you just open your site up without these, especially with AI bots which hammer servers and are in effect a legalized DDoS system

So delegalize it. Strip searching everyone to paper over the fact that the societal contract has been broken only delays that.


> AI bots which hammer servers

You can easily calculate which IPs/networks bots are using by looking at where most traffic comes from and who requests lot of pages with non-human speed.


The alternative is not have that one choke point that can be hammered. Decentralize.

I use CF and i don’t enable these anti bot measures. It’s up to the web master

We have few dozen websites, from ones doing single digit Mbit to few Gbits.

Never needed it. Just put the worst offenders in penalty bucket and that's usually enough


Anubis is one alternative, kinda sucks that we need to slow down the web for everyone a little bit though.

The most plausible near-term path is probably micropayments embedded invisibly in AI agents. Your agent that has learned what you value and can make a reasonable decision to allow a micropayment for certain content pays on your behalf without requiring a conscious decision each time, eliminating the mental transaction cost problem entirely. It's the mental transaction cost that arguably led to the failure of the micro payment model back in the early 2000s.

Although the cynical part of me says that this will result in malicious actors trying to trick agents into giving out a bunch of micro payments. There are counter defenses that can help detect and compensate for that, but perhaps the best we will be able to do is prompt user with the default agent recommendation.


I can no longer access any website that's "protected" by Cloudflare. As soon a website enables that stuff… "Shoot, another one bites the dust." I wonder if the website owners realise at all how many actual users they lose by this sort of "protection."

Cloudflare will just tell them that 70% traffic drop is because 70% of their traffic was bots, and everything is working fine, and hey, don't you want to upgrade to a paid plan to block 50% of the remainder? Think about how many bots will be blocked with that upgrade!

Do you really stand by these words?

I'm one of those who have enabled cloudflare on all of the sites I maintain. Additionally, Added turnstile on every form.

I know some actual users get blocked. But the amount of spam we get without it, the amount of bot traffic simply overwhelming the server... It is just too much.

Recently I also hard blocked all IPs from china Singapore India Pakistan Russia and whole of africa. Do I want to do it? No. But the amount of bot traffic and corresponding spam is a bigger problem :(


I also always block traffic from China, India, Pakistan, and Russia, after observing that 90%+ of the spam/scanning was coming from those countries.

At least for China, I imagine most of the real humans might use a VPN anyway


Yea, honest admins block entire regions because spam and bot traffic make it impossible to stay open

  > I know some actual users get blocked. But the amount of spam we get without it, the amount of bot traffic simply overwhelming the server... It is just too much.
So why not just shut down the website? Or remove the form entirely? That will ensure that you get no spam, right?

One of the core tenets of system design is Availability. If your service is not available - if your forms are blocking legitimate users - then why are you pretending to have a form submission feature at all? Just to frustrate users?


> One of the core tenets of system design is Availability. If your service is not available

The service won't be available to anybody because of overwhelming unwanted traffic. Now it's available for most potential users. You're speaking econ 101 when everyone else has played out iterated prisoner's dilemmas.


> So why not just shut down the website? Or remove the form entirely? That will ensure that you get no spam, right?

Turns out that people have a tolerance for a non-zero amount of work, but still have a limit.

Suggesting "turn off your website" is does not account for the desire to also provide some access.

Treat people who host content as humans, just as we must treat users as humans. There are tradeoffs, suggesting "shut down your website unless you provide access everywhere" is worse on all fronts for everyone.


> There are tradeoffs, suggesting "shut down your website unless you provide access everywhere" is worse on all fronts for everyone.

Maybe, maybe not.

If block-heavy websites shut down entirely, we lose some content, but other content moves to block-minimal sites and the average user might be able to access more.

Also if there's no blocking crutch, and people get pushed into shutdown and are mad about it, they might fight harder for anti-spam technology and legal enforcement, which could improve the situation.


Well I administer an ecommerce site, and for the checkout page I block everything besides Canada and USA.

Because those are the only two countries that we've ever in the life of our business, had a legitimate order from.

It prevents the majority of credit card testing, but it is tempting to apply it to the whole site to reduce traffic and server load.


[flagged]


Actually you are. It's called living in country. Lawless countries don't get blocked. If you don't like it, clean up your country.

>I wonder if the website owners realise at all how many actual users they lose by this sort of "protection."

How many people do you think are browsing with a weird enough config (eg. custom browser like OP, or some weird config like firefox with fingerprinting protection on a raspeberry pi) to trip cloudflare's protection?


Well… I know plenty people in my circle affected by this. Just have a slightly outdated system you simply can't afford to update: it's way to easy to get cut off like this. IMHO, a rather systematic discrimination of poorer people.

I got locked out of some websites by Cloudflare Turnstile on some very standard configurations, like an iPhone on Safari, or a Windows 11 desktop with Firefox or Edge, neither with a VPN on. I never found out why.

it's probably because a scraper farm updated their services to latest, and there was a window where fingerprinting was unable to differentiate.

We had all of our Devs Pixels get blocked, and after talking to CF, it was because Internet archive was rebooted their scraping farm, all the devices stampeded and overwhelmed the known bot safeguards, and those tags were added across the board. CF gives sites the tools to tune what is getting blocked, we bumped the sensitivity down to 25 and haven't had many complaints (despite having a very vocal community)

The most common complaint is users' IP address getting blocked because of compromised devices


Does not have to be weird, at least once it happened to me that their strictest settings simply banned something like major portion of internet users in my country - to the point that if you had FTTH you were likely blocked.

And no, it wasn't due to a country-based block selected by site operator.


I use a plain Firefox on a plain Windows 11 PC on a plain regular mass market ISP in a developed country and I get completely blocked by websites daily.

At least let me complete a "prove you are human" challenge or something, but don't outright ban my IP address?


Weird? I live in Thailand, use Firefox, and get half a dozen CF challenges per day.

It takes very little for CF to consider you "weird".


There are dozens of us :)

In my experience what really makes it loop every single time though is JShelter. CF doesn't like having your fingerprintable data bits messed with.

There are legitimate uses for non-instrusive, ethical and legal scraping, but some of us have had to resort to extreme measures:

https://roundproxies.com/blog/bypass-bot-detection/


Do you by chance have that installed? I don't use Cloudflare but I am curious if that code can scrape my silly blog? [1] Trying to pick the appropriate article... I'm guessing it can. I don't do the fancy javascript or TLS fingerprint inspections, just some janky hill-billy protections, silly redirects and Antarctic voodoo.

[1] - https://blawg.nochan.net/b/Internet-Crap/20260522-Maybe-AI-B...


>wonder if the website owners realise at all how many actual users they lose by this sort of "protection.

Yesterday cloudflare blocked me from visiting the MX-Linux site ... including an old browser with -no- protections ...

I have to wonder - assuming these sites are paying CF for this 'service' - are they getting a list of all the fejected IPs?


I took the time to write to one on LinkedIn and they didn't reply

> with no second thoughts whatsoever

As someone responsible for mitigating card testing "attacks", account harvesting, and DDOS attacks..

It is unfortunate, but the ISP industries(from telco up to transit) and CC industries aren't providing a lot of great options. This idea that people are doing things "without a second thought" is usually false when it comes to businesses.


They sometimes have to comply with legal requests (which I understand), but at the same time they have a huge market share - which means that the internet is becoming less and less decentralized and more in their control. We've seen the effects of that in previous outages...

I think what gives me anxiety about the whole situation is:

1. If X% of the population gets wrongly branded with the scarlet letter B[ot], how do they appeal and get it fixed?

2. How will sites notice and know if their choice of "bot protection" is losing them X% of users/customers/job-seekers etc.? If it's a really robust system, they'll never even see the complaints either...

3. If everyone does detect that something is awry, will it be such a monopoly that there's no choice but to let it happen?


I use a cellphone internet provider, there have been many a sites I couldn't access because or cloudflare or stupid recaptcha. i know damn well what a bicycle, bus, traffic light or stairs is.

>I am extremely worried about how so many seem to have outsourced the control over who can access their websites to a company, with no second thoughts whatsoever.

I think the Web is on its last legs, anyway. Generative AI and LLM-instead-of-search has destroyed what little value remained.


Governments too. It's inevitable that the international network will fracture into multiple national networks with heavy filtering at the borders as each country scrambles to impose their laws on it.

I'm glad to have known the true internet before its demise. Truly one of the wonders of humanity.


It's just one more facet of the enshittoscene, the era where actual product quality is completely irrelevant. Put it in the same bucket as websites that lag when you scroll, apps that refuse to show you video without a huge play/pause button overlaid in the middle of it that never goes away, and the movie Melania. My hypothesis is that billion-dollar businesses no longer exist to sell things to customers, but only to impress other billionaires to get their investment money.

Similar age here. And I have similar thoughts, although not about AI specifically. AI helps me get more done and not spend time on trivia and yak shaving, which is great. I do get more projects done, but those are projects I always wanted to do, just never had the time (or, sometimes the motivation, because of yak shaving tasks).

I think the biggest difference is that I no longer care about what people think about me and how I am perceived, so the motivation to publish my work went down to near zero. I used to build open source stuff, I no longer want to spend time on preparing stuff for publishing, making it available, dealing with people who will inevitably want something of me eventually. There just isn't enough time.

I can still be baited into responding on HN for some reason, and I am trying to work on that, because that is the ultimate waste of time.


Interesting — I haven't seen that problem, and I do have a system that has different APIs, web clients, non-web clients and embedded clients.

Incidentally, I developed my own template for a markdown rendering pipeline: markdown -> pandoc -> typst, with mermaid diagrams.

This works very, very well. I get linked in-document references, diagrams, tables, table of contents — everything I need for my design documents (and consulting work).


Am I the only one worried about Cloudflare becoming too powerful?

We went through this with E-mail: we slept through the period when Google, Microsoft and AWS were growing, and we ended up with them dictating the terms. Today I get 90% of my spam from Google, Microsoft and AWS and they don't care: they can safely ignore spam reports, because at this point they are Too Big to Block.

I have a feeling we are moving towards the same problem with Cloudflare and the web. Tomorrow Cloudflare will start dictating what we can or cannot do and we will not be able to do anything about it. This has already begun: their arbitrary "bot-filtering" for example.


Cloudflare is already too powerful, their anti DDOS solution is just too good. But their serverless products/features don't really build on that, they are just another hosting company.

> Am I the only one worried about Cloudflare becoming too powerful?

No, it gets brought up in every single thread about cloud flare. And if this wasnt a feature release that people seem to like, the top comment would probably be talking about how cloudflare is terrible for the internet.


I'm also curious. I bought the cheap alternative: XReal One Pro and… it is kind of as expected. It is a cheap alternative. Don't expect to use it for coding for several hours a day, in spite of what people keep saying. The optics are not up to it: there are imperfections in the lenses resulting in blurry areas of the screen, very visible as you move your head around.

They are great for watching video, make for a fantastic travel accessory, and one can use them for coding in a pinch, but I honestly couldn't find a good reason to, when I have a perfectly good MacBook Pro screen right in front of me.

I would definitely pay more for glasses that would allow me to have a better virtual computer display. Perhaps not $3500, but $2000? The main reason why I didn't even consider Apple Vision Pro is because of its humongous size, weight and complexity — I don't want another computer with another (locked down) OS requiring updates and maintenance. I want things that do not require anything of me. This is why XReal glasses are so nice: they are just a display. No battery, no OS, no maintenance.

EDIT: Just to clarify, I am very happy with the Xreal One Pro purchase. They provide excellent value. They are light, they are small, I can toss them in a bag to have a private display whenever I need. They are fantastic for travel and overall provide a great value. I would highly recommend them. Just don't expect them to be a better screen than your laptop screen for coding.


Fellow Xreal One Pro owner here. I agree 100%. The One Pros do make for a fantastic travel accessory but when it comes to coding (or reading text in general) for longer periods of time, my eyes usually start hurting after 2-3 hours because text is not 100% sharp and there's a slight blurriness / Moiré effect. (Which is a real bummer because, posture-wise, wearing the glasses puts a lot less strain on my neck than looking at a screen.)

That being said, there have been quite a few reports on Reddit lately from people that do use the glasses for coding all day every day. At the same time, my impression is that there have been fewer complaints about text blurriness than right after the One Pro got released. So I've started suspecting that Xreal might have fixed something about the hardware in recent batches. This is all very anecdotal, though. Maybe the hardware is the same and it's just my eyes.

Either way, I'm excited about future models with higher resolution. As many other people here in the thread said: This is definitely the future.


The blurry areas are 100% optics. I can turn the glasses OFF and look through them and see some areas waving. Not something software can fix.

But overall, agreed. I am looking forward to future models from Xreal. I especially enjoy the fact that they don't pretend to push a whole computer, which I don't want. They generally demand much less of me than most products do. I am tired of products dictating what I can or cannot do.


That's a very interesting comparison thanks. Hard to believe one is 87g and the other is ~800g and there's not much in-between!

I've been using the XReal One Pros for coding work for a few months now, and have had a great experience.

For me, the ergonomic benefits are the selling point, not the display quality. Not having to sit hunched over a laptop screen for several hours means I can work almost anywhere. Sometimes I'll use it in a cafe. Other times I just lie down in bed. I also make use of speech to text, so I just need to be able to press a hotkey and reach the track pad.

On the topic of display quality, it's important to use Better display to upscale the output to the XReals to high DPI - that gives noticeably better quality when it's downscaled to the (lower) native resolution of the XReals.


My problem was not the resolution, I could live with that. The problem is with optics: some areas of the screen are blurry. Depending on the particular unit you get (I had mine replaced once), the blurry areas are in different places. You might get a spot in the middle, or slightly to the side. If you fix your virtual display in place and move your head around, the blurriness will move across your virtual screen.

As I said, I had my glasses replaced because I thought I got a faulty unit, but the next one just had the blurry spots in different places. Then on a trip to Japan I visited several stores that had them on display and checked the display units — they all have the same problem.

I am not sure why, but not everyone is bothered by this. Perhaps some people don't care, or have had poor eyesight all along so they never saw the screen clearly. But for me there is a huge difference between seeing everything that's in front of me clearly and seeing blurry patches on my screen.


> it's important to use Better display to upscale the output to the XReals to high DPI

I got excited for a second but then read Better Display[0] is only available for MacOS? :(

[0]: https://betterdisplay.pro/


I think that response increasingly makes no sense (as time passes). Mozilla prevents people from building apps that access their devices because it might be possible to do something malicious.

I am so tired of being treated like a drooling idiot "for my own good".


The worry is real: there has historically not been a meaningful security barrier between a USB device and software running on the machine it's connected to. Firmware hasn't been developed with the assumption that the machine is malicious, there's probably lots of firmware which you can get RCE on by sending a weirdly formatted USB packet. Lots of devices have pretty unrestricted firmware update via USB functionality. And security is often fairly lax the other direction too; at least Linux implicitly assumes that hardware you connect is trusted, and there are lots of old, insecure drivers for USB devices out there.

Do users understand that by clicking "allow" on a website, an attacker can re-flash their mouse with firmware which causes the mouse to present itself as some obscure USB device which activates a vulnerable driver? That by clicking "allow" on a pop-up from a website, the website can abuse their keyboard to install a key logger or botnet? Should a user be expected to understand this?

I don't know how valid this fear is in practice. Has anyone done a study?


But that isn't how it works, it's not a prompt like asking permission to use the camera allow/deny. The user gets presented with list of compatible devices and they have to select one themselves.

An attacker could try to convince users to select something specific but that depends on the actual devices that are present and the "default" option to a confused non-technical person is to just cancel out of the list.


I know it works like that, the part about "clicking allow'" was a slight oversimplification which doesn't change the point. Do users understand the security implications of giving access to a device in the pop-up? I don't think so.

I had a chance to fly a simulator of the Beta Technologies VTOL airplane (they're a PartsBox customer). I went from horizontal flight into hover, and my guide said "oh, by the way, you are consuming a megawatt right now".

A megawatt. To hover.

That really opened my eyes to the reality: unless we have unlimited, clean and nearly free fusion power, flying cars are not going to be a thing.


Two things here: one, hovering is actually much more energy intensive than horizontal flight. Two, a megawatt isn't that much energy in the context of aerospace. A 737 engine produces nearly 100 megawatts at peak output (the engines are rated in terms of pounds of force, so the conversion is a bit wonky).

This conclusion is... kinda absurd.

In any reasonable setup, hovering would be a rare, rare operation (like 30-60 seconds during takeoff and landing), with most of the time spent in wing-borne forward flight – which'd be _wildly_ lower power usage, more like 200-250kW tops. About ~par with staying in continuous acceleration in an EV. More for sure, but not nearly as insane as what you're pointing to.

... and this is exactly where better batteries would help – being able to hold that power level for longer so you could actually go places in earnest without untenable mass.


Is it? If we're talking about a future where EVTOL takes over for passenger cars, there will be air traffic jams with delays that require extended circling and likely hovering.

There's a reason all the EVTOL startups show individual vehicles landing in pristine fields, and it's the same reason car advertisements show one car on a closed course instead of I-95 at 3pm on a Friday


... air traffic jams? The air is _much_ bigger than the corresponding ground.

Certainly there'd be density _at_ take-off and landing, but even that's manageable by having e.g. arrival/departure locations at multiple heights.

It also seems vanishingly unlikely (at this point) that we'd have EVTOL that's not fully autonomous, further reducing the odds of this - ~perfect and coordinated driving, as well as foreknowledge of what's happening between you and the arrival location drastically reduces traffic.


Do you know how planes land at an airport? They circle waiting for their turn. Why would that problem vanish?

... because the entire point of VTOL (which is what the parent commentary was about) is that you can take off and land vertically and therefore don't need one of a few, scarce, super-long runways? ... and the waiting you're talking about is entirely because of those?

On top of that, small VTOL craft that can hover and would be at lower speeds closer in (esp. autonomously flown) would just need less mutual clearance compared to jets, which also have an altitude band they have to stay in, as well as no ability to slow to a crawl and coordinate finely.


Gotcha, just spitballing - my mistake taking it seriously

You asked me why the problem of circling waiting for your turn would vanish when using VTOL aircraft. I don't know how to respond to that with anything other than, "That's the entire point of VTOL. It doesn't need one of those scarce runways that planes circle waiting for.".

That's fine sir, you don't need to know everything about a subject to spitball. On you go

My bad! You do list that you're an aeronautics person. I would genuinely genuinely love to understand what I'm missing – I'm sure there's some context here that I'm lacking!

If you want many things to land approximately at the same time and place, you need a little bit of play to schedule the arrivals/departures and ensure that you don't have collisions. There is a limit to the amount of aircraft you can safely cram in any amount of space.

Any aircraft you imagine will circle at landing and possibly loiter for minutes while waiting for their turn at using the airspace. (Edit0:See helicopters)

Building an open skyscraper for aircraft to land on will not save you since crafts will lockdown a large part of the building to land/depart safely. And it's not clear to me that it would be profitable.

Then many other problems about energy density and aircraft weight limiting the whole scope of who would possibly use those crafts.

Have a good one!

Edit1: I don't know for you, but my city doesn't have enough parking for cars. I'd be surprised if there were enough parking for EVTOL everywhere - you could very well need to loiter waiting for a spot to open, could need emergency landing if you run out of power, many many un-perfect things that make the card castle fall apart


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: