I did the same thing, but with gradient descent. You can create a soft version of the game of life, that is differentiable. Here is my messy collab notebook: [1]
I didn't work on it long enough to be able to draw any conclusions, but I can speculate.
I had the gradients going through the soft life approximation (i.e. it was part of the model), rather than simply training a normal cnn with life boards as the inputs and outputs. But I think the approximation may not have good enough gradient signals.
[1] https://colab.research.google.com/drive/12CO3Y0JgCd3DVnQeNSB...
Edit: only 1 step though, not 4, as in the OP. I couldn't get my differentiable version to converge more than 1 step into the past.