Postgres Health Check and SQL Performance Analysis

craigg · on Nov 7, 2019

As an Engineering Manager for the recently formed Database team here at GitLab, this tool has proven to be incredibly valuable. We floated the idea of running this on a regular basis back in April https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues... and now we have daily reports that we review https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues...

samokhvalov helped us to get this set up

sciurus · on Nov 7, 2019

That's a huge wall of text to digest every time the report is run. It would be nice if there was a diff mode that could summarize what changed between two runs.

NicoJuicy · on Nov 7, 2019

CTRL + F: Recommendations

Find next occurence 30 times and look when it's filled in.

If all is good, i think you need a minute or so.

evadne · on Nov 6, 2019

Belated. https://github.com/jfcoz/postgresqltuner

whycombagator · on Nov 6, 2019

Related. https://github.com/ankane/pghero

matthewaveryusa · on Nov 6, 2019

> tested on real-world databases containing 500,000+ tables and 1,000,000+ indexes.

Would love to hear about this use case out of curiosity

tomnipotent · on Nov 6, 2019

I'm guessing those numbers are cumulative across all the databases scanned, not within one.

samokhvalov · on Nov 6, 2019

No, it was a single real-world database.

Just in case if it looks surprisingly that Postgres can deal with such numbers of objects, I suggest checking this out: https://www.pgcon.org/2013/schedule/attachments/283_Billion_...

bdibs · on Nov 6, 2019

I don’t think they were questioning the possibility, just the practical need of it.

ahachete · on Nov 6, 2019

I think you should include the BTP database in the CI test suite ;)

KoenDG · on Nov 6, 2019

half a million tables...

In how many databases? Spread across how many machines? Using hardware that is how old?

It may well be a good project, but statements like that aren't inspiring of confidence.

samokhvalov · on Nov 6, 2019

One single database. Yes, quite unusual case of multi-tenant database. We needed to use checkup there a while ago so made some optimizations.

I agree with you, the statement is not 100% clear. We will double-check it (and maybe will simulate the case in CI), and will improve the statement. Thank you.

ahachete · on Nov 6, 2019

We have been using postgres-checkup for quite a while. Indeed, we have standardized on it as the basis on which we elaborate the health check and performance report analysis that we provide to our support customers.

It is a great project, constantly improving. Keep up with the work!

nbrempel · on Nov 6, 2019

This seems like a great idea. I used to have a collection of magic Postgres queries that would give me metrics for optimizing data models and indices. I’m surprised there isn’t a visual tool you can bolt into your Postgres install and get valuable metrics out.

1996 · on Nov 6, 2019

Interesting, but how heavy is it on the database server?

samokhvalov · on Nov 6, 2019

Hi, the founder of Postgres.ai here.

The footprint is very minimalistic. I have 15+ years of Postgres DBA experience, and what this tool does basically is what I usually do myself with performing Postgres health checks under heavy load. But in automated fashion :)

We very carefully choose approaches and queries that we run on production servers. And it is used under heavy loads (dozens of thousands of TPS) daily.

There are certain places that can be heavy. For example, if you have 1 million indexes (yes, it happens, sometimes), the SELECT query for bloat analysis will be slow. Actually, with default settings, the tool limits itself setting `statement_timeout = '30s'` (can be adjusted using CLI option `--statement-timeout`). So, in databases with a huge number of objects, you should expect that F004 and F005 reports will be missing.

mewwts · on Nov 7, 2019

That Joe tool of yours looks really cool!

zlepper · on Nov 6, 2019

This looks really really nice. Does it also work with something like Citus?