There is no block to block prediction other than DC prediction, so is this effect due to your distortion function spanning multiple blocks? Same for cross YUV channels, because your metric is in RGB space?
edit: second read-through I found the paper [1] which explains it. The answer is basically "yes", where the large scale distortion function is basically activity masking. Normally this would be implemented with delta-QPs, but because JPEG doesn't have that, Guetzli uses runs of zeroes instead.
This comes through the internal use of butteraugli -- and depending the quantization decisions on butteraugli.
Butteraugli uses a 8x8 FFT, but computes this every 3x3 pixel creating coverage at block boundaries. In later stages of butteraugli calculation values are aggregated from an even larger area. Block boundary artefacts are taken into account by this and impact quantization decisions.
Butteraugli operates neither in RGB nor YUV. It has a new color space that is a hybrid of tri-chromatic colors and opponent colors. Black-to-yellow and red-to-green are opponent, but blue is modeled closer to tri-chromatic. In more simple explanation it is possible to think of it as follows: first apply inverse gamma correction, second apply a 3x4 transform for rgb, third apply gamma correction, fourth calculate r - g, r + g and keep blue separate.
edit: second read-through I found the paper [1] which explains it. The answer is basically "yes", where the large scale distortion function is basically activity masking. Normally this would be implemented with delta-QPs, but because JPEG doesn't have that, Guetzli uses runs of zeroes instead.
[1] https://arxiv.org/pdf/1703.04421