Does TrailDB pre-aggregate metrics? AFAIK it stores the raw data bucketed with an actor id (usually a server or visitor) and performs compression and some columnar storage optimizations for events of actors.
We trace every server request with it, across all stages of the request—approx. 10-12 events/timestamps per request—across multiple processes (think Zipkin). The per-request event streams are independent "trails" per process, and then we merge them together later and compute aggregated metrics using hdr_histogram.
Individual events are typically hundreds to thousands of nanoseconds long, with about 20 nanoseconds of overhead to grab a timestamp using the rdtscp instruction.
If you don't mind asking: how do you uniquely identify the requests? Is it a composite field made up of `client_ip:timestamp` or a random UUID? What does a typical payload sent across to the TSDB look like? Also I'm assuming the services are written in a language like C or C++?
• we track function entry/exit/throw, offset (in nanoseconds) from the initial timestamp, and in the case of a throw, we also capture a stack trace (which is not stored in TrailDB)
I'm not sure I follow everything what you wrote. I'm interested in how many inserts are you doing per second to the DB. Is it in 1000s/sec or millions/sec.
Correct. We mostly use hdr_histogram right now for the post-processing, since stable latency is what we care about.
We also do some hdr_histogram processing for our dashboard in parallel to storing traces in TrailDB, and then we retain the individual traces to do longer term processing or when we're tracking down issues in production.
That's exactly what we're using TrailDB for. Works great.