> Running ~1000 iterations for a 483px wide Mona Lisa on the google colab GPU runtime only takes around 40 seconds!. Compared to the CPU version which takes several hours to do the same for a smaller image
Isn't this excruciatingly slow? I remember my old 486 computer did run life at full-screen full-resolution in realtime (probably 25fps at 800x600 ?) How it has come to that after 20 years? How can a 20 year old CPU outperform a modern GPU by one order of magnitude? Is it all fault of lots of useless intermediary language layers?
> Looking into speed of execution was not a primary goal, but in case people are interested initial: study suggests this code will do about 10 billion cells / second on the Google Colab GPU runtime.
For comparision this simple numba optimized GoL propagation function achieves roughly one billion cells per second on CPU (i7-9700K):
@numba.jit
def propagate_jit(arr, result):
h, w = arr.shape
for y in range(1,h-1):
for x in range(1,w-1):
result[y,x] = arr[y,x]
num_neighbors = (
arr[y-1,x-1] + arr[y-1,x] + arr[y-1,x+1] +
arr[y,x-1] + arr[y,x] + arr[y,x] +
arr[y+1,x-1] + arr[y+1,x] + arr[y+1,x+1]
)
if num_neighbors == 3:
result[y,x] = 1
elif num_neighbors < 2 or num_neighbors > 3:
result[y,x] = 0
arr = np.random.randint(0,2,(700,480))
arr2 = arr.copy()
%timeit propagate_jit(arr, arr2)
# 333 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
700*480*1e6/333
# 1009009009.009009
Btw: The not numba optimized implementation will run in 806ms on my system. The optimization gives a speedup of over 2400.
Not entirely sure what @numba.jit is doing here as I haven't used numba - is it compiling to native CPU code? Another approach to speed up GoL would be to use a lookup table. 2^9 entries. It's kind of like having the sum pre-calculated. Using the lookup table approach with @numba.jit would probably be even faster yet.
Isn't this excruciatingly slow? I remember my old 486 computer did run life at full-screen full-resolution in realtime (probably 25fps at 800x600 ?) How it has come to that after 20 years? How can a 20 year old CPU outperform a modern GPU by one order of magnitude? Is it all fault of lots of useless intermediary language layers?