Use StockFish to predict the next move, and only store the diff between the actu...

Retr0id · on March 15, 2024

The more advanced your predictor is, the slower your compressor gets. OP has 600 Million moves to encode, how long does it take to ask Stockfish for its opinion on 600M board states? (and then again, every time you want to decompress) (not a rhetorical question btw, I know little about chess engines)

I suspect the sweet spot here would be to use a much worse chess engine for the predictions, giving faster compression/decompression at the expense of the compression ratio.

marcusbuffett · on March 15, 2024

Your intuition here is right on the money. I wrote a subsequent post with pretty much that exact approach: https://mbuffett.com/posts/compressing-chess-moves-even-furt...

gus_massa · on March 15, 2024

I almost agree, but as other comment says, it's probaby better to use the StocFish points to generate a probability and then use Huffman encoding.

Retr0id · on March 15, 2024

You'd probably want to use something like CABAC rather than Huffman, since it deals better with dynamic probabilities. https://en.wikipedia.org/wiki/Context-adaptive_binary_arithm...

gus_massa · on March 15, 2024

It looks like a very interesting comparison. I still not sure how to tranform StockFish points to probabilities. (Perhaps proportional to exp(-difference/temperature), where temperature is a magical number or perhaps it's like something/elo??? There are too many parameters to tweak.)

Retr0id · on March 15, 2024

Given OP's dataset, you could probably figure out how to map them, experimentally (e.g. by making a histogram)

gus_massa · on March 15, 2024

I was imagining downloading the lichess database and trying different strategies to compress it, It's too much work for me, but I'd love to read a blog post if somsome does that.

V-2 · on March 15, 2024

> Use StockFish to predict the next move, and only store the diff between the actual move and the prediction.

This ties the algorithm down to one specific version of Stockfish, and configured identically (stuff like the hashtable size etc.), because all such factors will have an impact on Stockfish's evaluations. One factor changes, and you can't decompress the backup.