Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scaling out relational data models, and SQL, through co-location (citusdata.com)
52 points by ozgune on Dec 23, 2016 | hide | past | favorite | 16 comments


I do not get why giving each customer his own database would be so costly. Each postgresql instance can handle many smaller databases. Are shared buffers allocated per database?


First of all, every database comes with some metadata overhead (catalog tables) which amounts to ~ 30MB. Take into account that this also may (should!) be in memory (shared_buffers) so it can become expensive.

But the most significant factor is whether you need to do queries that span multiple users/tenants. If you need to, you will require postgres_fdw and nor performance neither manageability will be good at all.


> need to do queries that span multiple users/tenants

If you need that, shouldn't you use one database?


If you use it for multi tenanting then probably you won't need that.

But 30MB is a lot yes.


Yeah, kinda(not shared buffers but something else). Each table/index has overhead.

Question: Do you know of ANY db that you can do 1 db/customer ?(for 1 million customers) Just like I thought, no.


This obviously depends on the workload / amount of data per customer, but for some use-cases this could be done with SQLite.

EDIT: Well, looks like someone beat me to it


I assume he wants a proper database and to be honest if you have zero interaction between the things you are partitioning on it's very easy to scale with SQLite or any other database.

There's no reason you can't either run more servers or multiple instances of postgres per server, with docker or otherwise.

I'm probably going to try using Citus Data instead of moving to Cassandra if I ever get big. We'll see.


I honestly don't see why you wouldn't consider SQLite to be a "proper database." I think it has a reasonably competitive feature set.

In a usecase where you consider independent databases, with few interactions between them, for millions of users (and a few MBs + binary blobs of data per user) I'd certainly consider it as a possible solution.


Cassandra is completely different beast. Like, it has no feature of postgresql and vice-versa.


They both have selects and inserts


SQLite ;-)


Actordb does it in a distributed way.


Hey that's an interesting link. Thank you for that.


This is an OK article for someone using this product but there isn't anything interesting here unless you haven't considered partitioning a dataset to be spatially local to the users.


This comment is a tautology. "Article about subject not relevant to people not interested in subject".


That is not tautological. Interest and relevance are different things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: