Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> time-series aggregated metrics

That's exactly what we're using TrailDB for. Works great.



Does TrailDB pre-aggregate metrics? AFAIK it stores the raw data bucketed with an actor id (usually a server or visitor) and performs compression and some columnar storage optimizations for events of actors.


I'm curious about what kind of write speeds you are using traildb for?


We trace every server request with it, across all stages of the request—approx. 10-12 events/timestamps per request—across multiple processes (think Zipkin). The per-request event streams are independent "trails" per process, and then we merge them together later and compute aggregated metrics using hdr_histogram.

Individual events are typically hundreds to thousands of nanoseconds long, with about 20 nanoseconds of overhead to grab a timestamp using the rdtscp instruction.

Hope that helps!


If you don't mind asking: how do you uniquely identify the requests? Is it a composite field made up of `client_ip:timestamp` or a random UUID? What does a typical payload sent across to the TSDB look like? Also I'm assuming the services are written in a language like C or C++?


• random UUID

• we track function entry/exit/throw, offset (in nanoseconds) from the initial timestamp, and in the case of a throw, we also capture a stack trace (which is not stored in TrailDB)

• our server is written in C++


I'm not sure I follow everything what you wrote. I'm interested in how many inserts are you doing per second to the DB. Is it in 1000s/sec or millions/sec.


Millions, not thousands, of inserts per second. TrailDB is hella fast (that's why we use it).


So you have a batching mechanism for creating TrailDB files, right? Then you're able to query historical metrics by processing these TrailDB files.


Correct. We mostly use hdr_histogram right now for the post-processing, since stable latency is what we care about.

We also do some hdr_histogram processing for our dashboard in parallel to storing traces in TrailDB, and then we retain the individual traces to do longer term processing or when we're tracking down issues in production.


Thanks. I will check it out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: