Great read.. even prompted me to order "The Linux Programmer's Toolbox" from the author's affiliate link. For me there is no other way to REALLY learn about operating systems or hacking in general without tackling such problems that force you to dig more the further you look . This is the reason I prefer Dtrace as my investigative tool of choice and FreeBSD as my base OS for my side projects. Combined they allow one to see and investigate problems to a much deeper level without the severe overhead of other tools and understand the practical workings of a Unix derived OS without being a kernel hacker.
It's certainly rather surprising to see that colouring the output would cause such different (and buggy) behaviour - the number of matches shouldn't be affected by whether the output should be coloured or not... I would look carefully at how the execution flow diverges between the presence and absence of the --color option.
On your Mac the stack trace hints that the bug is due to a bad pointer/offset/size being passed to fwrite() but in the debug builds it looks like it's in the fastcmp() function. Is this the descrepancy you're talking about in the final paragraph?
If you would like the immersive experience - try this at the mac terminal:
echo i860 | grep --color -e i860 -e i86