minimize's comments

minimize · on July 16, 2024

For maximum efficiency, you should work in binary instead of base 10. Handling carries becomes more straightforward with the right primitives, for example __builtin_addc with GCC: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...

You can also implement it in C if you want a more portable solution: https://github.com/983/bigint/blob/ee0834c65a27d18fa628e6c52...

If you scroll around, you can also find my implementations for multiplication and such.

rurban · on July 16, 2024

For maximum efficiency you'd avoid addc like hell, because it blocks internal precomputation, and use guarantees like he did, avoiding overflow better. Better use int128 types though.

I'd just just stack objects for small typical sizes. And esp. do formal proofs than the random test joke.

minimize · on July 16, 2024

> For maximum efficiency you'd avoid addc like hell, because it blocks internal precomputation, and use guarantees like he did, avoiding overflow better.

Whether that is the case depends on the CPU. I hit memory bandwidth limits before adc becomes a problem. But I concede that you have a valid point and that there are definitely cases where leaving sufficient space for carries is the better strategy.

Anyway, we can probably both agree that "% 1000000000" is not the best way to do it.

koverstreet · on July 16, 2024

addc makes all your adds serialize on the flags register, which is really painful.

more modern approach is to reserve some bits out of every word for carries, but that drastically complicates things.

haberman · on July 17, 2024

Is this really true? I would intuitively expect that register renaming would apply to eflags too, so that reads from flags don't truly need to be serialized despite nominally writing a bunch of things to the same register.

EDIT: this paper (linked in another comment) seems to indicate that this is possible:

> An out-of-order machine can look ahead and process the accumulation pass in parallel with the partial sum pass using a renamed eFlags register.

https://web.archive.org/web/20150131061304/http://www.intel....

phire · on July 17, 2024

EFLAGS is actually put in the same renaming register as the result, so you get renaming for free.

The renaming registers in the Intel Pentium Pro and Pentium II are actually over 80 bits wide. They need to hold a full 80bit float, or 64bit MMX result. The Pentium III extends this to 128bit wide renaming registers to support SSE.

This is despite the fact that the P6 architecture only had 32bit bit integer registers until the Core 2 in 2006. So there is plenty of room to store EFLAGS in the same renaming register as the result. This also means that the branch uops point to the result of the most recent flag modifying instruction.

It was only with Sandybridge (and the introduction of AVX) that the P6 switched to a PRF design, with separate registers for floats and integers. Of course, Netburst also had a PRF design.

variadix · on July 17, 2024

Yes, it’s why https://en.m.wikipedia.org/wiki/Intel_ADX exists

JonChesterfield · on July 16, 2024

I remembered and ultimately found a source for a workaround for the serialising on flags problem, intel paper at https://web.archive.org/web/20150131061304/http://www.intel.... amounts to new instructions with better behaviour for ILP

minimize · on July 24, 2021

Not sure if it has been mentioned yet, but I think the main reason why this title failed is the title. Most people probably never heard of "yerba mate", so to them, the title sounds like some kind of weird mating simulator.

You can verify this by comparing the sales by country to the number of searches for the term "yerba mate" by country: https://trends.google.com/trends/explore?q=yerba%20mate The distributions are very similar.

donislawdev · on July 24, 2021

Niche genre :-} I know that the game will fail (before release), it's not problem for me :-}