Have you tried including the recaptcha v3 library and looking at the distributio...

rckoepke · on April 18, 2022

For web scraping specifically, I’ve developed key parts of commercial systems to automatically bypass reCAPTCHA, Arkose Labs (Fun Captcha), etc.

If someone dedicated themselves to it, there’s a lot more that these solutions could be doing to distinguish between humans and bots, but it requires true specialized talent and larger expenses.

Also, for a handful of the companies which make the most popular captcha solutions, I don’t think the incentives align properly to fully segregate human and bot traffic at this time.

I think we’re still very much still picking at the very lowest hanging fruit, both for anti-bot countermeasures and anti-anti-bot (counter-countermeasures).

Personally I believe this will finally accelerate once AI’s can play computer games via a camera, keyboard, and mouse. And when successors GPT-3 / PaLM can participate well in niche discussion forums like HackerNews or the Discord server for Rust.

Until then it’s mainly a cost filter or confidence modification. As long as enough bots are blocked so that the ones which remain are technically competent enough to not stress the servers, most companies don’t care. And as long as the businesses deploying reCAPTCHA are reasonably confident that most of the views they get are humans (even if that belief is false), Google doesn’t have a strong incentive to improve the system.

Reddit doesn’t seem to care much either. As long as the bots which participate are “good enough”, it drives engagement metrics and increases revenue.

spiffytech · on April 18, 2022

Scrapers can pay a commercial service to Mechanical Turk their way through reCAPTCHA. It makes a meaningful difference to scraping costs at scale, but sometimes it's still profitable.

ajolly · on April 19, 2022

I'd pay for a service to do this for me as an ordinary end user, so i never have to solve a captcha myself again.

jfoster · on April 19, 2022

You would still have to wait for each captcha to be solved, which might be more frustrating than doing it yourself.

9dev · on April 18, 2022

It sounds great, until you have Chinese customers. That’s when you’ll figure out Recaptcha just doesn’t really work in China, and have to begrudgingly ditch it altogether…