> I have an idea that the "espresso" algorithm could be used to minimize not onl...

monocasa · on Jan 10, 2020

It's useful in the compiler. For instance one optimization I haven't seen but would like to is going from

  if( value1 < 0 || value2 < 0 || value3 < 0 ) {

to

  temp = value1 | value2 | value3;
  if( temp < 0 ) {

Taking a sort of hardware, Boolean logic view makes this possible, as VHDL generally does this when synthesed.

adrian17 · on Jan 10, 2020

It seems like Clang is capable of doing this optimization:

> https://godbolt.org/z/oCQNbq

monocasa · on Jan 10, 2020

Neat! It didn't a few years ago.

This is most of the inner loop of a barycentric coordinate rasterizer, so I was pretty concerned about micro-optimizations when I was checking.

ospr3y · on Jan 10, 2020

As a compiler writer I can tell you that this is a complex optimisation with very limited use.

N.B. the || (logical or) operator is an "early out" operator whereas the | (binary or) operator is NOT

The compiler would need to look for special cases of value1, value2 and value3... such as: are any of them volatile, perhaps one refers to a memory mapped I/O device, perhaps one if actually a function call, or array dereference, or pointer dereference, what about value1 being a "char" which needs to be promoted before it can be or'd, what about any of the "values" being a constant < 0 which would short circuit the "if" statement and remove the need for any tests.

As I said, this is not as simple as it first appears.

FullyFunctional · on Jan 10, 2020

Note, if the three comparisons are sufficiently predicable, then your transformation would be _less_ efficient than the simple branches.

A predicable branch is almost free in a modern processor.

ncmncm · on Jan 12, 2020

Three comparison/branch pairs are not free.

Clang will optimize

  bool swap_if(bool c, 
      int& a, int& b) {
    int ta = a, tb = b;
    a = (-c & tb)|((c-1) & ta);
    b = (-c & ta)|((c-1) & tb);
    return c;
  }

into cmov instructions. Used in a quicksort partition loop as

  l += swap_if(
    *r <= pv, *l, *r);

It makes quicksort fully 2x as fast as the usual branch.

FullyFunctional · on Jan 24, 2020

Quicksort comparisons are obviously not predictable

thaumasiotes · on Jan 10, 2020

That optimization fudges the difference between boolean or (comparing truth values) and bitwise or. It happens to work under the safe assumption that value1, value2, and value3 use sign bits (as all two's-complement integers do), but logically it's a disaster.

Thorrez · on Jan 10, 2020

monocasa isn't saying this should be in source code, but rather the compiler should create machine code that does this. Compilers know the types, so they will know whether they're signed (if they're not signed, then < 0 can be optimized out as impossible). Compilers have no goal of creating assembly that would look good in source code.

emmanueloga_ · on Jan 10, 2020

Good point, although the idea would be you get to choose one for or the other. I think sometimes the simplified expression makes it easier to see the complement (else branch).

btw, I started to think about "boolean refactoring" when I saw J. Edwards' schematic tables video [1].

1: https://vimeo.com/140738254