The don't compose because (as you hinted) lock taking/releasing order has to be the same for all callers, but if mutexes are being taken behind the scenes then API usage order becomes critical, and it is trivial to screw that up.
> Lockless algorithms may be slower or cause performance issues, but they don't malfunction because you got the release order backwards.
And they may scale better. Scalability is important.
For me the classic case is rw locks vs. RCU-ish schemes. I think there's no case where rw locks are ever appropriate if you have an RCU-ish alternative. For example, OpenSSL uses only rw locks -- nuts!
I see your point, but as a counterpoint, consider that you have two or more distinct data structures, for example several separate linked lists. What if you need to update multiple lists in one atomic operation? With mutexes, you can do this rather easily: either you have one mutex that guards all lists, or you have one mutex for each list and you ensure you always take them in the same order. I would think that qualifies as composing. However, you can't easily compose a lockless algorithm that works on one list to make it work on multiple at the same time.
I would, in fact, give precisely the reverse advice for one reason:
Mutexs don't compose.
Mutexs are really good at putting subtle bugs into your code that are ridiculously difficult to figure out.
Lockless algorithms may be slower or cause performance issues, but they don't malfunction because you got the release order backwards.