Hacker Newsnew | past | comments | ask | show | jobs | submit | bgentry's commentslogin

This is largely because LISTEN/NOTIFY has an implementation which uses a global lock. At high volume this obviously breaks down: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale, hence the alternative notifiers and the fact that most of its job processing doesn't depend on notifications at all.

There are other reasons Oban recommends a different notifier per the doc link above:

> That keeps notifications out of the db, reduces total queries, and allows larger messages, with the tradeoff that notifications from within a database transaction may be sent even if the transaction is rolled back


> None of that means Oban or similar queues don't/can't scale—it just means a high volume of NOTIFY doesn't scale

Given the context of this post, it really does mean the same thing though?


No, I don't think so. Oban does not rely on a large volume of NOTIFY in order to process a large volume of jobs. The insert notifications are simply a latency optimization for lower volume environments, and for inserts can be fully disabled such that they're mainly used for control flow (canceling jobs, pausing queues, etc) and gossip among workers.

River for example also uses LISTEN/NOTIFY for some stuff, but we definitely do not emit a NOTIFY for every single job that's inserted; instead there's a debouncing setup where each client notifies at most once per fetch period, and you don't need notifications at all in order to process with extremely high throughput.

In short, the fact that high volume NOTIFY is a bottleneck does not mean these systems cannot scale, because they do not rely on a high volume of NOTIFY or even require it at all.


Does River without any extra configuration run into scaling issues at a certain point? If the answer is yes, then River doesn’t scale without optimization (Redis/Clustering in Oban’s case).

While the root cause might not be River/Oban, them not being scalable still holds true. It’s of extra importance given the context of this post is moving away from redis and to strictly a database for a queue system.


Yup. I wasn’t talking about notify in particular but about using Postgres in general.

Yeah, River generally recommends this pattern as well (River co-author here :)

To get the benefits of transactional enqueueing you generally need to commit the jobs transactionally with other database changes. https://riverqueue.com/docs/transactional-enqueueing

It does not scale forever, and as you grow in throughput and job table size you will probably need to do some tuning to keep things running smoothly. But after the amount of time I've spent in my career tracking down those numerous distributed systems issues arising from a non-transactional queue, I've come to believe this model is the right starting point for the vast majority of applications. That's especially true given how high the performance ceiling is on newer / more modern job queues and hardware relative to where things were 10+ years ago.

If you are lucky enough to grow into the range of many thousands of jobs per second then you can start thinking about putting in all that extra work to build a robust multi-datastore queueing system, or even just move specific high-volume jobs into a dedicated system. Most apps will never hit this point, but if you do you'll have deferred a ton of complexity and pain until it's truly justified.


state machines to the rescue, ie i think the nature of asynchronous processing requires that we design for good/safe intermediate states.

I get the temptation to attribute the popularity of these systems to lazy police with nothing better to do, but from personal experience there’s more to it.

I live in a medium sized residential development about 15 minutes outside Austin. A few years ago we started getting multiple incidents per month of attempted car theft where the thieves would go driveway to driveway checking for unlocked doors. Sometimes the resident footage revealed the thieves were armed while doing so. In a couple of cases they did actually steal a car.

The sheriffs couldn’t really do much about it because a) it was happening to most of the neighborhoods around us, b) the timing was unpredictable, and c) the manpower required to camp out to attempt to catch these in progress would be pretty high.

Our neighborhood installed Flock cameras at the sole entrance in response to growing resident concerns. We also put in a strict policy around access control by non law enforcement. In the ~two years since they were installed, we’ve had two or three incidents total whereas immediately prior it was at least as many each month. And in those cases the sheriffs could easily figure out which vehicles had entered or left during that time. I continue to see stories of attempted car thefts from adjacent neighborhoods several times per month.

I totally get the privacy concerns around this and am inherently suspicious of any new surveillance. I also get the reflexive dismissal of their value. In this case it has been a clear win for our community through the obvious deterrent factor and the much higher likelihood of having evidence if anything does happen.

Our Flock cameras do not show on the map here, btw.


Aaron did RT the post here which likely indicates some agreement with the sentiment in it: https://x.com/searls/status/1972293469193351558

Also he shared it directly while saying it was good here: https://x.com/tenderlove/status/1972370330892321197


Take my money! I have been looking for a good way to get Claude to stop telling me I'm right in every damn reply. There must be people who actually enjoy this "personality" but I'm sure not one of them.


As one of River's creators, I'm aware of many companies using it, including names you've heard of :) One cool open source project I've seen recently is this open source pager / on-call system from Target called GoAlert: https://github.com/target/goalert


Modern Postgres in particular can take you really far with this mindset. There are tons of use cases where you can use it for pretty much everything, including as a fairly high throughput transactional job queue, and may not outgrow that setup for years if ever. Meanwhile, features are easier to develop, ops are simpler, and you’re not going to risk wasting lots of time debugging and fixing common distributed systems edge cases from having multiple primary datastores.

If you really do outgrow it, only then do you have to pay the cost to move parts of your system to something more specialized. Hopefully by then you’ve achieved enough success & traction to justify doing so.

Should be the default mindset for any new project if you don’t have very demanding performance & throughput needs, IMO.


PostgreSQL is quite terrible at OLAP, though. We got a few orders of magnitude performance improvement in some aggregation queries by rewriting them with ClickHouse. It's incredible at it.

My rule of thumb is: PG for transactional data consistency, Clickhouse for OLAP. Maybe Elasticsearch if a full-text search is really needed.


This. Don't loose your time and sanity trying to optimize complex queries for pg's non deterministic query planner, you have no guarantee your indexes will be used (even running the same query again with different arguments). Push your data to clickhouse and enjoy good performance without even attempting to optimize. If even more performance is needed, denormalize here and there.

Keep postgres as the source of truth.


I find that postgres query planner is quite satisfactory for very difficult use cases. I was able to get 5 years into a startup that wasn't basically trying to be the next Twitter on a 300 dollar postgres tier with heroku. The reduced complexity was so huge we didn't need a team of 10. The cost savings were yuge and I got really good at debugging slow queries to a point of I could tell when postgres would cough at one.

My point isn't that this will scale. It's that you can get really really far without complexity and then tack on as needed. This is just another bit of complexity removal for early tech. I'd use this in a heart beat.


It’s simple high throughput queries that often bite you with the PG query planner.


Redis, PG, Clickhouse sounds like a good combo to scale workloads that require OLTP and OLAP workloads, at scale (?).


You can get pretty high job throughput while maintaining transactional integrity, but maybe not with Ruby and ActiveRecord :) https://riverqueue.com/docs/benchmarks

That River example has a MacBook Air doing about 2x the throughput as the Sidekiq benchmarks while still using Postgres via Go.

Can’t find any indication of whether those Sidekiq benchmarks used Postgres or MySQL/Maria, that may be a difference.


this is really cool and I'd love to use it, but it seems they only support workers written in Go if I'm not mistaken? My workers can be remote and not using Go, behind a NAT too. I want a them to periodically pull the queue, this way I do not need to worry about network topology. I guess I could simply interface with PG atomically via a simple API endpoint for the workers to connect, but I'd love to have the UI of riverqueue.


very interesting and TIL about the project, thanks for sharing. What DB is River Queue for 46k jobs/sec (https://riverqueue.com/docs/benchmarks)

UPDATE: I see its PG - https://riverqueue.com/docs/transactional-enqueueing


IIUC - Does this mean if I am making a network call, or performing some expensive task from within the job, the transaction against the database is held open for the duration of the job?


Definitely not! Jobs in River are enqueued and fetched by worker clients transactionally, but the jobs themselves execute outside a transaction. I’m guessing you’re aware of the risks of holding open long transactions in Postgres, and we definitely didn’t want to limit users to short-lived background jobs.

There is a super handy transactional completion API that lets you put some or all of a job in a transaction if you want to. Works great for making other database side effects atomic with the job’s completion. https://riverqueue.com/docs/transactional-job-completion


very nice! cool project


Developer of River here ( https://riverqueue.com ). I'm curious if you ran into actual performance limitations based on specific testing and use cases, or if it's more of a hypothetical concern. Modern Postgres running on modern hardware and with well-written software can handle many thousands or tens of thousands of jobs per second (even without partitioning), albeit that depends on your workload, your tuning / autovacuum settings, and your job retention time.


Perceived only at this stage, though the kind of volume we’re looking at is 10s to 100s of millions of jobs per day. https://github.com/riverqueue/river/issues/746 talks about some of the same things you mention.

To be clear, I really like the model of riverqueue and will keep going at a leisurely pace since this is a personal time interest at the moment. I’m sick of celery and believe a service is a better model for background tasks than a language-specific tool.

If you guys were to build http ingestion and http targets I’d try and deploy it right away.


Ah, so that issue is specifically related to a statistics/count query used by the UI and not by River itself. I think it's something we'll build a more efficient solution for in the future because counting large quantities of records in Postgres tends to be slow no matter what, but hopefully it won't get in the way of regular usage.

> Perceived only at this stage, though the kind of volume we’re looking at is 10s to 100s of millions of jobs per day.

Yeah that's a little over 100 jobs/sec sustained :) Shouldn't be much of an issue on appropriate hardware and with a little tuning, in particular to keep your jobs table from growing to more than a few million rows and to vacuum frequently. Definitely hit us up if you try it and start having any trouble!


You could make that argument for lots of services that have external side effects, but that’s about what happens after the service has been asked to do a thing (to send an email in this case).

However just because an action may be duplicated after the provider has been asked to do a thing, it does not eliminate the value of the provider being able to deduplicate that incoming request and avoiding multiple identical tasks on their end. Without API level idempotency, a single email on the client’s end could turn into many redundant emails at the service provider’s side, each of which could then be subject to those same subsequent duplications at the SMTP layer. And even then, providers can use the Message-Id header to provide idempotency in delivery as many do.

This is an unavoidable consequence of distributed systems where the client may not know if the server ever received or processed the request, and it may also occur due to client-side bugs or retries within their own software.

In other words, API level idempotency can help eliminate all duplication prior to the API; depending on the service, the provider may also be able to eliminate duplication afterward as well. So it’s strictly better than not having it, really not that difficult to implement, and makes it easier for integrators to build a robust integration with you.


> makes it easier for integrators to build a robust integration with you

No, don't say 'easier'. It makes it possible to build a robust integration. We need to stop with this notion that omiting idempotency from an API just makes things "more difficult" to develop. Without idempotency, you garuantee that the resulting system is "difficult" to use and full of nasty issues that are waiting for the right conditions to collapse the entire house of cards you've built.

So many SaaS providers have never even heard of idempotency, let alone design it into their APIs. Many people believe you can just sprinkle it on as a library without having to think about it.

All APIs with multiple distributed servers must support idempotency. Refuse to do business with any organisations who do not design this into their APIs!


Hah, I agree, "easier" is too soft :)


> Without API level idempotency, a single email on the client’s end could turn into many redundant emails at the service provider’s side, each of which could then be subject to those same subsequent duplications at the SMTP layer.

Ok so now there’s a 1/100 million chance that the client gets 3 duplicate emails.

I’m not arguing that idempotency is never important. The most popular blog post I’ve ever written is about the 2 generals problem and how idempotency can help.

I’m arguing in this specific instance it doesn’t matter.

As far as I was aware duplicate message-id headers aren’t deduped by every client, but if they are being used for deduplication just expose that in your api and let the caller set it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: