Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t think ARM does, but there might be some NEON thing that you can use to do something equivalent.



Huh, it looks like that only works on 1-byte values? That’s an interesting choice.


Worse, it's a fertile ground for "interesting" bugs, because VADDV (which sum-reduces the result) reduces into an 8 bit uint. So if you e.g. accumulate two or more quadword VCNTs into a uint8x16_t and then VADDV it, you could end up with something other than the actual overall bit count (because 2 quadwords can have _256_ bits set). Same with accumulating 8 or more VADDVs, except now individual bytes could wrap around if you don't widen in between.


Speaking of ARM, it has an rbit instruction which for some reason doesn't exist on x86.


Apparently x86 isn't "web scale" enough to have to deal with endianness ;)


It is not really about endianness. The instruction reverses bits not bytes. IIRC is needed as a required step of FFT.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: