Clickhouse is a really cool and stupidly fast columnar database

schoolornot · on June 17, 2023

I understand why OLAP writes are faster but is there any reason why OLTPs can't achieve similar read performance with denormalized and sharded data?

rscrawfo · on June 18, 2023

Aggregation is a huge reason. For rolling up data, something can’t like Clickhouse can’t be beat by oltp

tempest_ · on June 17, 2023

What problems can I solve with a columnar database?

What type of data benefits from that type of Database?

Exuma · on June 17, 2023

Imagine you have a small business that tracks in the order of 10's - 100's of millions of events (pageviews, clicks, whatever), and you have reporting you want to run. Trying to do this in PG/MySQL would likely need to use materialized views so your reports don't take a long time to run. You could store your event data in CH directly, or use ELT/ETL process to sync/copy it into clickhouse just for reporting. Then, your queries would be very fast. It's must faster (for certain types of queries, mainly timeseries queries or queries involving aggregation of many rows). It's faster because of how the data is stored on disk. It's NOT good for fetching/updating/deleting single rows however.

It's originally designed to handle hundreds of columns, and billions of rows, but I think it can still apply to much smaller use cases that value performance. I'm implementing it currently in a similar scenario, and I'm using AirByte OSS version to ELT from postgres. Then I'm using tableau or some other BI tool to analyze that data much more effectively (I will be trying to perform complex aggregations/group by reports on 100mm rows)

datatrashfire · on June 17, 2023

Row based databases are optimized for accessing compete rows and joins. Columnar storage is optimized for accessing all, or many column values across rows. This makes aggregates and applying transformation logic faster with columnar storage than row based storage. Ie they are great for data warehouses and other analytical workloads.

Ps, great and still highly relevant resource covering all the major database system designs, their advantages and drawbacks: https://www.oreilly.com/library/view/designing-data-intensiv...

FridgeSeal · on June 17, 2023

Less about the data itself and more about the specific operations you want to do on it.

Large aggregations, massive datasets, large joins, and workloads that are ready heavy and eschew row-level mutations.

They get used for data analysis frequently, time series data and associated analysis meshes quite nicely too. ClickHouse itself was originally built to support arbitrary analytical queries on clickstream data at pretty massive scale. Cloudflare uses it for live analytics, Uber uses it for logs.

pjot · on June 17, 2023

An over simplification:

Columnar stores are optimized for reads. Row stores are optimized for writes.

anonacct37 · on June 17, 2023

This is an overly simplistic but also correct answer: clickhouse was developed for analytics on clickstreams.

Technically the overall idea is that if you have lots of queries that only read certain columns and your database stores rows contiguously it's a waste to read a whole row and then discard columns.

Also compression (such as run length or delta or even ztsd) often works better if you give it a block of data that's from one column (such as a timestamp or tag value).

linuxdude314 · on June 17, 2023

That’s a longer subject that fits in a comment here.

If you are _actually_ interested I suggest using google search to find some good sites that go over what a column oriented database does/is used for.

This isn’t hard; I’ll get you started:

https://www.kdnuggets.com/2021/02/understanding-nosql-databa...

Exuma · on June 17, 2023

Or he, you know, could just ask, because that is the spirit of discussion.

Dachande663 · on June 17, 2023

Cloudflare use it to ingest 6M/s

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

jgrahamc · on June 17, 2023

Way more than that now.

cplli · on June 17, 2023

Personally tried it, it can handle logs nicely. And from their page, many more things

https://clickhouse.com/use-cases

craigching · on June 17, 2023

Uber wrote a blog on using Clickhouse to store logs: https://www.uber.com/blog/logging/

esafak · on June 17, 2023

Columnar databases let you do fast aggregations and read only the columns you are interested in. They are for analyzing data.