More

wuputah · on Aug 18, 2024

I think it will be straightforward to expose time_bucket in pg_duckdb. Feel free to open an issue for the feature.

wuputah · on April 17, 2024

That is an incorrect and baseless accusation, we had nothing to do with "Postgres (tuned)". My commits are only in the `hydra` folder. There are no restrictions on how you set up the benchmark in Clickbench and the settings we use there are analogous with what we use on our cloud service for a similar sized instance.

As the linked post points out, the main 'advantage' of the "tuned" benchmark is the indexes, which are tuned specifically to the queries in the benchmark. We do not use indexes in our version of the benchmark, aside from the primary key (which actually provides no performance advantage).

riku_iki · on April 17, 2024

I apologize for falsely claiming that it was Hydra devs who committed Postgres config.

However, I think problem stands: it is your main marketing pitch to compare HydraDB to undertuned PG, it is right on landing page of your project.

> the main 'advantage' of the "tuned" benchmark is the indexes

I am not sure which post you referred to, but unless you or someone else analyzed execution plans for all PG queries in that benchmark and verified that indexes are actually used, it is just speculations without evidence.

Another issue with this comparison is that ClickBench is toy micro-benchmark with just 100M records. Increasing datasize may or may not be beneficial for HydraDB.

wuputah · on Nov 30, 2023

> does it work with the existing postgres apt/yum repos?

We only support apt for now but plan to support other package managers in the future. It works with existing Postgres apt packages, we recommend using PGDG but the default system packages on Debian/Ubuntu work as well.

> Does it work with the postgres Docker image?

yes, in fact this how our `container` feature works. https://docs.pgxman.com/container

wuputah · on Nov 29, 2023

In my opinion, we plan on accomplishing this by using a container; it's not quite something we have today, but this is good feedback. :)

On Ubuntu/Debian, Postgres doesn't typically work this way, so it's not the way that pgxman works. pgxman works on top of the existing `postgresql` packages and with the existing package manager (apt) in order to install extensions -- which is also how it handles runtime dependencies, whether libraries or even other extensions.

So, that said, we have a container feature I could see using to effectively isolate for a single project. Right now there is only one single "global" container (per Postgres version) that pgxman will manage for you, but this is just a MVP of this feature. I could definitely see something like `pgxman c dev` or similar which will read a local pgxman pack file (pgxman.yaml) in your project and boot a "local" Postgres for you just for that project.

The pgxman pack is already a thing and is how the local container config is maintained, but we haven't tied it together in the way described above... yet. For more on both pgxman pack and the container feature, check out our docs.

wuputah · on Sept 20, 2023

Thanks for calling these out, as these are just misunderstandings. We will certainly tweak the language around these.

- Installing the extension itself does not change the default table type, this is only the case on Hydra Cloud and our Docker image.

- "Hydra is not a fork" refers to the fact that Hydra did not fork Postgres; it is an extension. We have put in a lot of effort since forking Citus, but it's not our intent to hide that fact.

- Yes, "Hydra External Tables" is a productization around FDWs, there's more we want to do with it but it hasn't been our focus lately.

efxhoy · on Sept 20, 2023

> - Installing the extension itself does not change the default table type, this is only the case on Hydra Cloud and our Docker image.

Ah cool, thanks! How would I go about adding the extension to my own "FROM postgres:15" Dockerfile?

wuputah · on Aug 3, 2023

First we added a bitmask to mark rows as deleted - these rows are filtered out on read. Then updates are implemented as deletions + inserts. We have also added vacuum functions to remove/rewrite stripes that have >20% of deleted rows in order to reclaim space and optimize those stripes.

wuputah · on Aug 3, 2023

of course :) drop by our Discord if there's something you'd like to contribute and want to chat about it beforehand, need help/have questions getting started, etc. https://hydra.so/discord

wuputah · on Dec 14, 2022

Yes, Hydra Columnar and PostGIS get along just fine. We've not looked into any PostGIS-specific optimizations yet, but if users run into issues, we'll be happy to investigate.

pella · on Dec 15, 2022

based on https://github.com/HydrasDB/hydra/blob/main/columnar/src/bac...

the columnar "does not support gist, gin, spgist and brin indexes." :-(

and "gist" is important for spatial indexing : http://postgis.net/workshops/postgis-intro/indexing.html

wuputah · on Dec 14, 2022

Of course, that's the power of Postgres. You can join between columnar tables or between columnar and heap (row-based) tables. The performance of joins hasn't been a specific focus of our engineering work yet, but I made a little test of enriching an analytical query with user data here: https://gist.github.com/wuputah/e62b83f86880bd7e6623809afe4c...

wuputah · on Dec 14, 2022

Yeah, I agree! However, ClickBench has used 500GB GP2 as the "standard" for some time, so I stuck to it for consistency. We use GP3 for our hosted service, and I did test on GP3 as well with identical settings as GP2 and the results are very similar.

ahachete · on Dec 14, 2022

But something that is variable in time cannot be "consistent" by definition ;)

Great to know this is known. My recommendation still holds: publish results with GP3: whatever others do (potentially, wrong) shouldn't prevent you from doing it right.

I'd be giving a deeper look at the project.