Yeah, this one as well: bool is_divisible_by_6(int x) { return x % 2 == 0 && x %...

Koffiepoeder · 2025-12-03T21:06:02 1764795962

Mhm, this is one of these cases I'd prefer a benchmark to be sure. Checking %2 is very performant and actually just a single bit check. I can also imagine some cpu's having a special code path for %3. In practice I would not be surprised that the double operand is actually faster than the %6. I am mobile at this moment, so not able to verify.

Bratmon · 2025-12-03T22:13:04 1764799984

But if % 2 && % 3 is better, then isn't there still a missed optimization in this example?

NobodyNada · 2025-12-04T06:14:34 1764828874

Let's throw this into godbolt: https://clang.godbolt.org/z/qW3qx13qT

    is_divisible_by_6(int):
        test    dil, 1
        jne     .LBB0_1
        imul    eax, edi, -1431655765
        add     eax, 715827882
        cmp     eax, 1431655765
        setb    al
        ret
    .LBB0_1:
        xor     eax, eax
        ret

    is_divisible_by_6_optimal(int):
        imul    eax, edi, -1431655765
        add     eax, 715827882
        ror     eax
        cmp     eax, 715827883
        setb    al
        ret

By themselves, the mod 6 and mod 3 operations are almost identical -- in both cases the compiler used the reciprocal trick to transform the modulo into an imul+add+cmp, the only practical difference being that the %6 has one extra bit shift.

But note the branch in the first function! The original code uses the && operator, which is short-circuiting -- so from the compiler's perspective, perhaps the programmer expects that x % 2 will usually be false, and so we can skip the expensive 3 most of the time. The "suboptimal" version is potentially quite a bit faster in the best case, but also potentially quite a bit slower in the worst case (since that branch could be mispredicted). There's not really a way for the compiler to know which version is "better" without more context, so deferring to "what the programmer wrote" makes sense.

That being said, I don't know that this is really a case of "the compiler knows best" rather than just not having that kind of optimization implemented. If we write 'x % 6 && x % 3', the compiler pointlessly generates both operations. And GCC generates branchless code for 'is_divisible_by_6', which is just worse than 'is_divisible_by_6_optimal' in all cases.

senfiaj · 2025-12-04T09:27:38 1764840458

I also tried this

  bool is_divisible_by_15(int x) {
      return x % 3 == 0 && x % 5 == 0;
  }

  bool is_divisible_by_15_optimal(int x) {
      return x % 15 == 0;
  }

is_divisible_by_15 still has a branch, while is_divisible_by_15_optimal does not

  is_divisible_by_15(int):
        imul    eax, edi, -1431655765
        add     eax, 715827882
        cmp     eax, 1431655764
        jbe     .LBB0_2
        xor     eax, eax
        ret
  .LBB0_2:
        imul    eax, edi, -858993459
        add     eax, 429496729
        cmp     eax, 858993459
        setb    al
        ret

  is_divisible_by_15_optimal(int):
        imul    eax, edi, -286331153
        add     eax, 143165576
        cmp     eax, 286331153
        setb    al
        ret

abainbridge · 2025-12-03T17:04:30 1764781470

Nice.

Is the best way to think of optimizing compilers, "I wonder if someone hand wrote a rule for the optimizer that fits this case"?

stouset · 2025-12-03T17:45:44 1764783944

Probably not, because a lot of the power of optimizing compilers comes from composing optimizations. Also a lot comes from being able to rule out undefined behavior.