Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Running ~1000 iterations for a 483px wide Mona Lisa on the google colab GPU runtime only takes around 40 seconds!. Compared to the CPU version which takes several hours to do the same for a smaller image

Isn't this excruciatingly slow? I remember my old 486 computer did run life at full-screen full-resolution in realtime (probably 25fps at 800x600 ?) How it has come to that after 20 years? How can a 20 year old CPU outperform a modern GPU by one order of magnitude? Is it all fault of lots of useless intermediary language layers?



Regarding the speed of GoL itself:

The JAX implementation of GoL he used should be able to do 10 billion cells / second

http://www.bnikolic.co.uk/blog/python/jax/2020/04/19/game-of...

> Speed of execution

> Looking into speed of execution was not a primary goal, but in case people are interested initial: study suggests this code will do about 10 billion cells / second on the Google Colab GPU runtime.

For comparision this simple numba optimized GoL propagation function achieves roughly one billion cells per second on CPU (i7-9700K):

    @numba.jit
    def propagate_jit(arr, result):
        h, w = arr.shape

        for y in range(1,h-1):
            for x in range(1,w-1):
                result[y,x] = arr[y,x]
                num_neighbors = (
                    arr[y-1,x-1] + arr[y-1,x] + arr[y-1,x+1] +
                    arr[y,x-1] + arr[y,x] + arr[y,x] +
                    arr[y+1,x-1] + arr[y+1,x] + arr[y+1,x+1]
                )
                if num_neighbors == 3:
                    result[y,x] = 1
                elif num_neighbors < 2 or num_neighbors > 3:
                    result[y,x] = 0

    arr = np.random.randint(0,2,(700,480))
    arr2 = arr.copy()
    %timeit propagate_jit(arr, arr2)
    # 333 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    700*480*1e6/333
    # 1009009009.009009
Btw: The not numba optimized implementation will run in 806ms on my system. The optimization gives a speedup of over 2400.


Not entirely sure what @numba.jit is doing here as I haven't used numba - is it compiling to native CPU code? Another approach to speed up GoL would be to use a lookup table. 2^9 entries. It's kind of like having the sum pre-calculated. Using the lookup table approach with @numba.jit would probably be even faster yet.


That's not 1000 iterations of GoL. That's a thousand iterations of trying to find the best seed.


I stand corrected. The numbers were too way off to make sense at all!


probably 99% of time shuffling data around, 1% of time calculating




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: