I didn't work on it long enough to be able to draw any conclusions, but I can speculate.
I had the gradients going through the soft life approximation (i.e. it was part of the model), rather than simply training a normal cnn with life boards as the inputs and outputs. But I think the approximation may not have good enough gradient signals.
Do you have any idea why that might be? It seems like convolution would be a natural for this problem.