> *time-series aggregated metrics* That's exactly what we're using TrailDB for. ...

buremba · on Feb 4, 2017

Does TrailDB pre-aggregate metrics? AFAIK it stores the raw data bucketed with an actor id (usually a server or visitor) and performs compression and some columnar storage optimizations for events of actors.

no0b20 · on Feb 4, 2017

I'm curious about what kind of write speeds you are using traildb for?

erichocean · on Feb 4, 2017

We trace every server request with it, across all stages of the request—approx. 10-12 events/timestamps per request—across multiple processes (think Zipkin). The per-request event streams are independent "trails" per process, and then we merge them together later and compute aggregated metrics using hdr_histogram.

Individual events are typically hundreds to thousands of nanoseconds long, with about 20 nanoseconds of overhead to grab a timestamp using the rdtscp instruction.

Hope that helps!

sanjayts · on Feb 4, 2017

If you don't mind asking: how do you uniquely identify the requests? Is it a composite field made up of `client_ip:timestamp` or a random UUID? What does a typical payload sent across to the TSDB look like? Also I'm assuming the services are written in a language like C or C++?

erichocean · on Feb 5, 2017

• random UUID

• we track function entry/exit/throw, offset (in nanoseconds) from the initial timestamp, and in the case of a throw, we also capture a stack trace (which is not stored in TrailDB)

• our server is written in C++

no0b20 · on Feb 4, 2017

I'm not sure I follow everything what you wrote. I'm interested in how many inserts are you doing per second to the DB. Is it in 1000s/sec or millions/sec.

erichocean · on Feb 4, 2017

Millions, not thousands, of inserts per second. TrailDB is hella fast (that's why we use it).

buremba · on Feb 4, 2017

So you have a batching mechanism for creating TrailDB files, right? Then you're able to query historical metrics by processing these TrailDB files.

erichocean · on Feb 5, 2017

Correct. We mostly use hdr_histogram right now for the post-processing, since stable latency is what we care about.

We also do some hdr_histogram processing for our dashboard in parallel to storing traces in TrailDB, and then we retain the individual traces to do longer term processing or when we're tracking down issues in production.

no0b20 · on Feb 4, 2017

Thanks. I will check it out.