Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you only need something like `jsdom` to have the core API available. The DOM itself is just a tree structure with special nodes. Most APIs are optional and you can provide stubs if you're targeting a specific websites. It's not POSIX level.


I would like to know more about this. I had some web scrapers in Perl but they no longer work. :(


The state of the art is to remote-control a real browser now. Defeats all not-a-real-browser checks. You can even click on the cloudflare captchas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: