Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks the conclusion is that the POPCNT instruction ("cpu") is faster than any of the SSE implementations. Only AVX2 outperforms POPCNT, and only for large enough bitstrings.


On i7:

> AVX2 code is faster than the dedicated instruction for input size 512 bytes and larger

The difference is indeed tiny. Still - it's very cool the generic AVX2 code can beat the instruction burned in the silicon!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: