From what I understand, this is essentially an extension to Snort that uses high performance NICs with built-in FPGAs to do the heavy lifting.
Honestly, I don't really think this is the best way forward. Using Suricata on cheap hardware with OSS rulesets I can get _very close_ to gigabit throughput. Instead of relying on specialized hardware Suricata can use CUDA making it much more accessible.
I like Suricata, too, but am disappointed that they are dropping Snort's unified2 binary file support. Binary file formats are so much faster and efficient than JSON. I have no idea why projects use JSON when they expect to produce tens or hundreds of GB of data. When monitoring 100 Gb links, JSON marshaling and unmarshaling is so wasteful and costly and one reason why they need clusters of servers that expand easily.
I don't think anybody disagrees that json is wasteful and costly, it's just that most people see it as the necessary evil to make the data portable and easy to integrate with different SIEMs or databases.
I don't have a ton of experience with unified2, but it's certainly not as simple to bring into something like Elasticsearch compared to JSON where you basically just point fluentd at it and let it do all the work. All of the common log management tools are built for text files, so why not play to their strengths?
unified2 is not a great format and does not sound very flexible for further development. The JSON marshaling is expensive but is not the first of my worries when doing line rate inspection on traffic at those rates.
I think most shops have a million integrations that they want to set up for their IDS alerts and JSON is pretty close to a lingua franca.
JSON is widely used. There is no arguing that. And for small to medium networks, JSON is fine. However, when you are monitoring multiple 100 Gb links and you produce hundreds of GB of logs a day, JSON is the wrong format. Especially if you want to search in a reasonable period of time.
Zeek can output JSON as well (in addition to plaintext). I've done comparisons. Using jq to search zeek JSON is five times slower than using the C++ simdjson library (fastest JSON parser known to humankind). And simdjson is three times slower than bro-cut on the plaintext logs. This may not sound like a lot, but it is extremely important when monitoring multiple 100Gb links and you have 40 GB JSON conn logs every hour and you need queries to run in a reasonable period of time.
jq is just slow.. using github.com/buger/jsonparser to slice out some fields from a large json log is about 10x faster than using jq.
The most common mistake I see is people composing pipelines and putting jq before grep. Or worse, using select() in jq. The grep should always come first. Even if you need to do something like
.. | fgrep value | jq ... | fgrep value
to pre-filter, and then filter again to rule out any false positives.
I have some tooling built around jsonparser and fastjson that I need to split out and open source. jq just makes me sad.. very capable, but overkill for what most people use it for and very slow for the common case.
source: wrote the C version of bro-cut that you like so much :)
If you're generating that much logging though, you're definitely not using brocut/jq/etc, you're going to be well into Kafka/Hadoop/elk territory. See also, Apache metron.
As an aside from the paper, I'm a PM at Cloudflare and very interested in hearing from current IDS users on what you'd like to see out of the edge IDS we're building. Reach out to rustam @ cloudflare if you'd like to chat!
Yes, in theory. In reality, the premise is that they're looking for attacks in protocols that aren't encrypted, which is part of the reason the approach doesn't make much sense anymore --- the signatures are all tuned for attacks that only really matter in internal networks, and internal networks are already insecure.
(You could also park this in between a TLS-terminating reverse proxy and the app servers, I guess).
Honestly, I don't really think this is the best way forward. Using Suricata on cheap hardware with OSS rulesets I can get _very close_ to gigabit throughput. Instead of relying on specialized hardware Suricata can use CUDA making it much more accessible.
https://suricata-ids.org/