Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The node packing format he describes sounds a bit like a LOUDS tree [1], which stores the structure of a tree as a bit array (each node as a '1' for each child, plus a '0'---for a total of 2n-1 bits for a tree of n nodes), and the data in a separate packed array. It can't represent the node-deduplication (nodes with multiple parents), but I think it gives comparable compression: for the full word list of 3,213,156 nodes, the tree structure is 6,426,311 bits (0.76MB), plus 3,213,156 bytes of character data---for 3.83MB total.

The downside is that traversing the tree is a series of linear bit-counting operations---which can be painfully show without a bit of pre-caching.

[1]: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/aladdin/wwwloca...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: