Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We're about to announce a new Python scraping toolkit, memorious: https://github.com/alephdata/memorious - it's a pretty lightweight toolkit, using YAML config files to glue together pre-built and custom-made components into flexible and distributed pipelines. A simple web UI helps track errors and execution can be scheduled via celery.

We looked at scrapy, but it just seemed like the wrong type of framing for the type of scrapers we build: requests, some html/xml parser, and output into a service API or a SQL store.

Maybe some people will enjoy it.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: