Clog: Making a homegrown ClickHouse log for $20 a month

komuW · on Feb 13, 2023

My favorite technique for reducing the cost of logging is the same technique that was employed in The Apollo Guidance Computer(though I'm not sure if they did it for cost purposes).

To quote from Annotations to Eldon Hall's Journey to the Moon[1]: "The Coroner recorded every instruction executed, with its inputs and results, writing over the oldest record when it filled up. When a program crashed, you could punch out a full record of what it was doing in most of its last second and analyze the problem at your ease. I have often wished that PCs offered such an advanced feature."

So essentially buffer all logs into an in-memory circular buffer of capacity N. If a log record is emitted that has a certain severity/level; flush all records from the buffer to disk/clickHouse/grafana/whatever.

The python MemoryHandler[2] almost implements the said technique, except that it also flushes when buffer is full; which is not particularly what I would want.

I also wrote a blogpost[3] about how to log without losing money or context, ~3yrs ago.

1. https://authors.library.caltech.edu/5456/1/hrst.mit.edu/hrs/...

2. https://github.com/python/cpython/blob/v3.11.1/Lib/logging/h...

3. https://www.komu.engineer/blogs/09/log-without-losing-contex...

twic · on Feb 13, 2023

I had a program which occasionally segfaulted (and even raised a SIGILL once, i forget how). By the time it segfaults, it's too late to get logging out (easily, at least). But i didn't want to write an ever-growing log of everything.

So, i did something a bit like the coroner. When the program started, it created a fresh log file, extended it to a certain size, and memory-mapped it. It then logged into this buffer, with new logging overwriting old (it wasn't actually a circular buffer; the program dropped a big blob of logging into the buffer at the top of its main loop).

While alive, the process never closed or msynced the mapping, and it was fixed size, so the kernel was under no particular pressure to write the contents to disk. But when the process crashed, the kernel would preserve the contents.

I admit i never benchmarked this, so i don't know whether it actually avoided excessive writes. But it seemed like a neat idea in principle!

kevin_nisbet · on Feb 13, 2023

It's been quite a few years, but when I started in telco, we had a vendor product that I think sort of worked like this. They were using a "micro-services" architecture within their nodes before it became popular. They also used a crash-only approach to writing software. So lots of asserts for unhandled / unexpected cases.

As I remember it, they wrote their crash handler to include the ring buffer of recent messages sent to the services. So whenever they'd get into an unexpected state, they'd just crash the process, and collect the ring buffer of recent messages along with the other normal things in a mini core. Made it so easy to track down those unexpected / corner cases in that platform.

foobiekr · on Feb 13, 2023

This is a very common practice in embedded code, generally three things:

1. A ring of log-like objects (obviously not rendered strings, since that is a waste of CPU) that can be optionally included in a crash report in structured form that can be dissected later.

2. Compiler-generated enter/exit counters and corresponding table per module, modules linking themselves as init time to the master table, for performance counters [invocations or time spent]; dumpable on demand; lightweight and always on

3. a ring of logs - these actually being rendered logs plus indices into (1) - that have been otherwise rendered, so the retention cost is minimal and you can map back to log files otherwise provided.

The distinction between (1) and (3) should be obvious, but in case it is not, short circuiting log rendering for logs that should otherwise be dropped is a very important practice to avoid debug-level logs consuming the majority of CPU time.

Traditionally, all of these are trivially inspectable in a core dump, but usually you'd like a reduced crash report instead: less wear and tear on the flash and easier for users [and bug management systems] to juggle. Crash reports and cores obviously need to include an unambiguous version [typically a hash of the code rather than a manually managed version #, for dynamically linked ELFs, fingerprints of all libraries as well]; for cores you just make sure to compute this at start and keep it in memory reachable from a pointer out of main().

pmalynin · on Feb 13, 2023

Modern CPUs do actually offer this for the most part, it’s called time travel debugging. Intel’s offering is called Intel Processor Trace. Although it’s not full input output logging.

foobiekr · on Feb 13, 2023

As a rule, it is better to have developers learn one way to work, rather than N.

A problem with time travel debugging is that you generally can't use it in production [of course, there are people who think devs should have direct access to prod, for them there is no help], and you 100% cannot use it for anything deployed at a customer (so for embedded, devices, actual non-SAAS software etc. etc.).

It's better to shore up your tools so that the workflow is very straightforward and leave stuff like time travel for people doing work on a very narrow subset of very hard to understand bugs.

e12e · on Feb 13, 2023

Isn't qryn (nee cLoki) just as good as rolling your own?

https://github.com/metrico/qryn

Ed: given: "... for two years now" - I think clog/qryn was a bit rougher around the edges at the time - so my question is more for today - would qryn do 80% to 200% of your home grown solution today, with 10%-50% of the effort?

qxip · on Feb 16, 2023

Thank you for mentioning qryn! And yes, thanks to our growing community the rough edges should all be well rounded and we now support logs, metrics and telemetry formats with very decent results at a fraction of the complexity, without reinventing the wheel or adding new protocols and formats for people to adopt and learn. It's good if it just works right? And now with https://qryn.cloud we continue our mission towards a lightweight polyglot stack for core and edge observability - on top of the growing ClickHouse superpowers, as always!

Disclaimer: I work on qryn

BoorishBears · on Feb 13, 2023

On one hand I appreciate the motivation behind this and feel like engineering for today is an underappreciated skill... but on the other, I can't help but wonder if ELK stack would have been costlier for the load described (< 100 messages a second)

In my experience you're at the sweet spot where ES scaling isn't really an issue, and you get to enable a lot of really useful tooling and exploration for a growing team through Kibana

-

Also for anyone doing this today, ClickHouse now has experimental improved JSON handling: https://clickhouse.com/docs/en/guides/developer/working-with...

The main reason I bring up ELK stack is using materialized views in the style mentioned sounds fairly unergonomic if you have a lot of event types, so with the improved JSON handling in CH you'd at least have a nicer query interface

Dachande663 · on Feb 13, 2023

Author here. The limit we hit with ES was the memory required to have anything approaching decent response times for queries. I've always tried to keep ES in memory but that becomes a pain, and honestly the defaults for CH just... work whereas ES ends up needing a hundred dials slightly tweaked to make it work (and keep working).

The new JSON type looks amazing though! It's probably time I update the code, but honestly it's just been silently chugging along in the background with zero maintenance for now.

tylerhannan · on Feb 13, 2023

As someone who has worked at both companies (and is at ClickHouse presently), this -- to me -- is the best line I've read this week:

"and honestly the defaults for CH just... work whereas ES ends up needing a hundred dials slightly tweaked to make it work (and keep working)."

And yeah, that requires an immense amount of focus and effort. Glad you noticed. <3

systemz · on Feb 13, 2023

Maybe this alternative to ES could help with memory usage? https://github.com/manticoresoftware/manticoresearch

https://news.ycombinator.com/item?id=34741018

jabart · on Feb 13, 2023

How is the compression for these logs for CH?

victor106 · on Feb 13, 2023

Vector.dev might be a good replacement for filebeat. Built on Rust and very efficient

godber · on Feb 13, 2023

I came here to say something similar. It has also occurred to me that vectordev hasn’t gotten much attention here on HN. It’s a great tool. You can listen on a TCP socket, write to Kafka in an upstream instance, then read from Kafka, transform data, and write to ES or whatever downstream. I’ve even recently completed tests where I could write from vector to vector at over 6 million records a second.

I don’t really mean to discourage the author of the post, I just realize this can be done with two simple vector configs.

tveita · on Feb 14, 2023

In particular Vector can do simple transforms on the messages (or quite complex transforms using Lua script), and insert directly into Clickhouse.

I respect the simplicity of Redis here, but personally I would consider Kafka if I wanted to scale this setup further, maybe one of:

Vector -> Kafka -> Clickhouse (using Clickhouse's built-in Kafka support)

Vector -> Kafka -> Vector -> Clickhouse

This also lets you stream the logs live if you want to.

potamic · on Feb 13, 2023

How good are they at opentelemetry support?

rickette · on Feb 13, 2023

I like FluentBit, written in C and very light on resources.

chrnola · on Feb 13, 2023

This sounds fairly similar to self-hosting SigNoz[1] except they use the OpenTelemetry Collector[2] in the place where the author has a custom Redis queue consumer.

1. https://signoz.io

2. https://opentelemetry.io/docs/collector/

sdairs · on Feb 13, 2023

This is awesome! We did something similar for our internal logging at Tinybird (which itself is built on ClickHouse) and I recently turned into it a (very simplified) Starter Kit that others can fork and use in their projects. https://github.com/tinybirdco/log-analytics-starter-kit

Rather than writing to files and using an agent like beats to tail them, it sends logs directly from the application code with a basic POST request. Obviously, you could just as happily tail the file and forward the logs, but this approach reduced the footprint of tools and made it work in serverless environments.

gnfargbl · on Feb 13, 2023

Homegrown isn't necessarily cheaper at the bottom end of the scale: Grafana Cloud includes 100GB/month of logs (a few million records a day, depending on your average log size) at $29/mo [1]. You also get 20k Prometheus metrics at that price.

The price scaling of hosted solutions is less attractive, e.g. in this case a 10x increase in log volume translates to a ~20x increase in cost.

[1] https://grafana.com/pricing/

numbsafari · on Feb 13, 2023

Something the article leaves out is the volume of the logs ingested (GB) vs. just the number of log entries. He says "between 750K and 1M logs a day", but doesn't indicate how large those logs are.

Most commercial solutions are priced based on volume/size rather than qps or API calls.

dilyevsky · on Feb 13, 2023

If you use CH with s3 storage engine it will be considerably cheaper for same amount of logs

samsquire · on Feb 13, 2023

I feel any solution to this problem would be useful engineering for application development in a backend Microservice architecture. Being capable of querying and storing lots of data is similar to the problem of being internet scalable.

I am surprised there is no default implementation that just scales to internet levels by default, for lots of user traffic.

Would be nice if there was an infrastructure in a box that handled this for you. It's the kind of thing enterprise consortiums could work to provide

jatins · on Feb 13, 2023

Is it possible to `tail` a centralised logging solution like this?

10987654321 · on Feb 13, 2023

It depends on the solution, Loki for instance has support for it: https://grafana.com/blog/2019/08/13/lokis-path-to-ga-live-ta...

With that said unless you tune your query it's very easy to get overwhelmed once you scale a bit and start getting larger log volumes.

darkstar_16 · on Feb 13, 2023

Not really, but then this isn't used for single service apps. It'd be difficult to tail anyway when your app grows beyond a few services.

summarity · on Feb 13, 2023

Not really, that’s exactly what Papertrail does.

dilyevsky · on Feb 13, 2023

Clickhouse has experimental feature enabling a Watch https://clickhouse.com/docs/en/sql-reference/statements/watc...

numbsafari · on Feb 13, 2023

Yes, but YMMV...

https://cloud.google.com/logging/docs/view/streaming-live-ta...

memset · on Feb 13, 2023

I recently built something similar. I have a web app which logs json data to a file. Then I use vector.dev to tail the file. Vector has a clickhouse integration so it automatically pushes logs into the DB. I also have it automatically push data to S3 as a backup. The system works remarkably well!

manishsharan · on Feb 13, 2023

A long time ago , I hooked up all my docker images to send logs to AWS Cloudwatch. Now, regardless of which cloud or VPS I am using, all my logs are at AWS Cloudwatch. So that's one less headache for me.

fathyb · on Feb 13, 2023

My biggest worry with such a solution is a program going crazy and blowing up my bill at the end of the month.

Less a concern for enterprise projects where you can setup cost monitoring, but terrifying for hobby projects running on credits. I can't find the article, but I remember reading something similar happening to someone using GCP not very long ago.

manishsharan · on Feb 13, 2023

I share your fear ; I am running this on my bootstrap budget. I have taken a couple of precautions -- my debug logs expire within 2 days and others within a month. and I have set alarms to trigger in case of increased or unexpected metrics and I have be tested those alarms . It is a very good idea to first write and test your alarms for AWS before you deploy to AWS . My docker images run on Hetzner cloud and Vultr and they have no network egress costs and there is no AWS network charge for data ingress.

bks · on Feb 13, 2023

Is there a front end to Cloudwatch that can visualize the results in a way so that we don't need to give AWS account access to the viewer?

manishsharan · on Feb 13, 2023

I don't think so but there is a solution: you can create a locked down role for user who can only see Cloudwatch metrics and visualization for your log group. I have not implemented this but AWS IAM gives you this ability.

Finding_sj · on Feb 13, 2023

Do people use compression on logs? I come across a log processor algorithm that compresses logs and retain their searchability. So it is not generalize compression.