Scaling out relational data models, and SQL, through co-location

raarts · on Dec 24, 2016

I do not get why giving each customer his own database would be so costly. Each postgresql instance can handle many smaller databases. Are shared buffers allocated per database?

ahachete · on Dec 24, 2016

First of all, every database comes with some metadata overhead (catalog tables) which amounts to ~ 30MB. Take into account that this also may (should!) be in memory (shared_buffers) so it can become expensive.

But the most significant factor is whether you need to do queries that span multiple users/tenants. If you need to, you will require postgres_fdw and nor performance neither manageability will be good at all.

paulddraper · on Dec 25, 2016

> need to do queries that span multiple users/tenants

If you need that, shouldn't you use one database?

raarts · on Dec 24, 2016

If you use it for multi tenanting then probably you won't need that.

But 30MB is a lot yes.

ddorian43 · on Dec 24, 2016

Yeah, kinda(not shared buffers but something else). Each table/index has overhead.

Question: Do you know of ANY db that you can do 1 db/customer ?(for 1 million customers) Just like I thought, no.

unsoundInput · on Dec 24, 2016

This obviously depends on the workload / amount of data per customer, but for some use-cases this could be done with SQLite.

EDIT: Well, looks like someone beat me to it

andy_ppp · on Dec 24, 2016

I assume he wants a proper database and to be honest if you have zero interaction between the things you are partitioning on it's very easy to scale with SQLite or any other database.

There's no reason you can't either run more servers or multiple instances of postgres per server, with docker or otherwise.

I'm probably going to try using Citus Data instead of moving to Cassandra if I ever get big. We'll see.

unsoundInput · on Dec 24, 2016

I honestly don't see why you wouldn't consider SQLite to be a "proper database." I think it has a reasonably competitive feature set.

In a usecase where you consider independent databases, with few interactions between them, for millions of users (and a few MBs + binary blobs of data per user) I'd certainly consider it as a possible solution.

ddorian43 · on Dec 24, 2016

Cassandra is completely different beast. Like, it has no feature of postgresql and vice-versa.

brianwawok · on Dec 24, 2016

They both have selects and inserts

andy_ppp · on Dec 24, 2016

SQLite ;-)

ddorian43 · on Dec 24, 2016

Actordb does it in a distributed way.

raarts · on Dec 24, 2016

Hey that's an interesting link. Thank you for that.

xemdetia · on Dec 23, 2016

This is an OK article for someone using this product but there isn't anything interesting here unless you haven't considered partitioning a dataset to be spatially local to the users.

andy_ppp · on Dec 24, 2016

This comment is a tautology. "Article about subject not relevant to people not interested in subject".

duaneb · on Dec 24, 2016

That is not tautological. Interest and relevance are different things.