Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Columnar storage systems rarely store the raw value at fixed position. They store values as run length encoded, dictionary encoded, delta encoded, etc... and then store metadata about chunk of values for pruning at query time. So rarely can you seek to an offset and update a value. The compression achieved means less data to read from disk when doing large scans and lower storage costs for very-large-datasets that are largely immutable - some of the important benefits of columnar storage.

Also, many applications that require updates also update conditionally (update a where b = c). This requires re-synthesizing (at least some of) the row to make a comparison, another relatively expensive operation for a column store.



Also typically stored with binary compression (snappy, lib) after the snappy compression. In-memory might only be semantic, eg, arrow.

But it's... Fine? Batch writes and rewrite dirty parts. Most of our cases are either appending events, or enriching with new columns, which can be modeled columnarly. It is a bit more painful in GPU land bc we like big chunks (250MB-1GB) for saturating reads, but CPU land is generally fine for us.

We have been eyeing iceberg and friends as a way to automate that, so I've been curious how much of the optimization, if any, they take for us




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: