I've always found having to deal with background workers a pain in the butt. Not only do you have to set up additional servers (queue and worker servers), but also you run into problems as a result of your own naivety--such as not using a connection pool for workers so the startup time is too slow, or using a library that doesn't handle timeouts. Also, I find that I have little visibility into the queuing throughput, or if I chain jobs for workers (result of one worker goes to another worker), I have no idea if the data made it all the way through.
Does anyone else find it a pain, or just me?
Also, has anyone ever tried workers as a service, like iron.io? Are there others, and was it worth it?
Personally, I use queue_classic + postgres triggers to keep track of each of the job history. So I have a history of all the jobs, when they started, when they finished, the data in each job, etc.
But yes, it's a pain.
http://torquebox.org/ is supposed to help with the pain of managing background jobs (and lots of other things) for Rails applications.
Man, wish I had time to help you complete it, I have crawled the web thoroughly harvesting all worthy message queues, but my bookmarks are so unorganized because I used to have all important stuff in Chrome, which really has ZERO support in organizing things. Once you bookmark something, it's gone, except it's in your bookmark bar.
Then I switched to Firefox, which has better support for organizing things, starting with Tab-Groups, Tags for Bookmarks etc. unfortunately it's not helping to organize bookmarks too. Importing all chrome bookmarks leaves me with an unmanageable task of sorting things.
I had the same problem and that's why I've created this page. If you'll find something useful in your bookmarks, I'm open for any contribution, just send PR or create issue. Thanks!
I like the idea of having a page listing all libraries regarding a certain topic. What's the advantage of a github + Rails solution has over a wiki page?
More useful than a comprehensive list of all queuing libraries would be a comparison of the best ones with tl;dr summaries of how each works, pros & cons of each approach, and some consistent benchmarks.
I've always liked beanstalkd, unfortunately it still has no authentication support so you're unlikely to see it catch on in the PaaS environments any time soon (unless someone hooks SASL into it).
Celery is more of a distributed processing framework than a queue in and of itself. It usually plugs into a queueing system, but you can have it use mongo or redis for it's backend if you want and no one would call those queueing systems.
I would add my implementations and resources, even if they were for educational purposes:
i) Persisting Native Python Queues: http://blog.databigbang.com/persisting-native-python-queues/
ii) Adding Acknowledgement Semantics to a Persistent Queue: http://blog.databigbang.com/adding-acknowledgement-semantics...
iii) Esoteric Queue Scheduling Disciplines (an essay about a new kind of queue): http://blog.databigbang.com/esoteric-queue-scheduling-discip...
iv) Using Queues in Web Crawling and Analysis Infrastructure (just informative): http://blog.databigbang.com/using-queues-in-web-crawling-and...