Before I proceed, let me just state that I have no particular knowledge of eithe...

ryeguy · on Dec 22, 2019

Kafka can scale and distribute individual streams across the cluster. That is, the entire "database" is distributed across nodes. With postgres, your unit of scalability is the entire database. You can't natively have some tables on one node and some tables on another node, for example.

Kafka is also just more optimized for what it does. Postgres is a superset of what kafka does, so kafka is unsurprisingly better able to optimize for its usecase. It has a zero-copy protocol that can shuttle data to/from disk to/from the network without bringing it into memory (using the sendfile syscall). It doesn't wait for disk writes when doing writes, because it achieves durability via replication.

Also, don't forget the things you'd have to do when implementing consumers. How will you load balance streams between consumers? Meaning, if you have a stream and you want multiple consumers to burn it down at a time, how can you make sure they aren't duplicating work and they can handle the consumer group growing/shrinking? How will you handle checkpointing where each consumer tracks what they've done so far? What about streams' data rolling off?

All of this is doable with pg, but you'd have to implement it yourself. With kafka and its client drivers, this is handled for you.

marco_craveiro · on Dec 22, 2019

As I said, I haven't given a lot of thought about this so please take my opinion with a grain of salt - but I believe that once you split the log out of PostgreSQL, a lot of functionality of this ilk could start to be considered. When/if added, I think it would make for a stronger PostgreSQL in the end. However, I do understand this is an insanely hard amount of work. In a way, it bears some similarities to splitting GTK out of GIMP, for instance; extremely difficult thing to do but ultimately it turned out to be a massive win for both projects. This would be even harder, but ultimately, greatly advantageous.

plesiv · on Dec 26, 2019

Musing similarly... Or just replace PostgreSQL's transaction log with Kafka, given all the Kafka's advantages. Seems to me that traditional RDBMSs cram too many seemingly independent subsystems into a single system, each one of which if separate and behind a well defined interface could be made substantially better. I think there's more room for software that is akin to SQLite - a library replacing RDBMS functionality within an application and thus allowing for composability on language-linking level as opposed to process-linking level (done with an orchestrator).