rabyss's comments

rabyss · on May 14, 2018

Have you done some comparison with a trie implementation (known for good performance on prefix matching, see: https://en.wikipedia.org/wiki/Trie)

trentnelson · on May 14, 2018

I love tries for larger data sets, and have used them many times for longest prefix matching problems.

However, they inevitably involve pointer chasing, are quite branchy, and not really amenable to SIMD optimizations. They also do not have any means in place for doing a fast-path negative match, which was important for me.

I think you'd really struggle to get anything based on pointer chasing and binary search to compete with the CPU cycle counts I was seeing toward the end for the assembly versions (e.g. 6 cycles for negative match, 14 for prefix match, 21 for worst-case false-positive negative match).

I comment a little bit about it here: http://trent.me/is-prefix-of-string-in-table/#other-applicat...

> The other nice side-effect is that it forces you to pick which table a given string should go in. I made this decision by looking at which types occurred most frequently, and simply put those in the first table. Less frequent types go in subsequent tables.

> I have a hunch there's a lot of mileage in that approach; that is, linear scanning an array of string tables until a match is found. There will be an inflection point where some form of a log(n) binary tree search will perform better overall, but it would be very interesting to see how many strings you need to potentially match against before that point is hit.

> Unless the likelihood of matching any given string in your set is completely random, by ordering the strings in your tables by how frequently they occur, the amortized cost of parsing a chunk of text would be very competitive using this approach, I would think.

> A fun experiment for next time, perhaps!

rabyss · on Feb 12, 2018

That was my thinking too: slick looking and all, but you can actually pretty easily do it yourself if you are a web dev. Most people use truly amazing frameworks for free, I don't think they would donate for this.