occam (MIMD) is an improvement over CUDA (SIMD) with latest (eg TSMC) processes,...

jacquesm · 2025-07-06T23:18:46 1751843926

Adapting your code would be the biggest issue.

librasteve · 2025-07-07T05:36:21 1751866581

true … that’s why the transputer failed in the first attempt.

nevertheless, coding an array of RISC CPUs in an HLL is far easier and would have a broader base than hand tuned, machine specific CUDA

jacquesm · 2025-07-07T08:27:20 1751876840

Before you know it you'll be going down the compute fabric and 'fleet' rabbit hole. For a long time I thought that was the future (I even worked with Transputers back in the day) but now I'm not so sure. GPUs have gotten awfully powerful and are relatively easy to work with compared to trying to harness a large number of independently operating CPUs. Debugging such a setup is really hard. That said, I still have this hope that maybe one day such an architecture will pay off in a bigger way than what has happened so far. If someone cracks the software nut in a decisive manner then it may well happen.

librasteve · 2025-07-07T08:44:21 1751877861

well - yes ... that's the point of occam[1] ... if it can hang, it will hang deterministically

we have to zoom out from the 1980s when 4 CPUs were a lot ... but now you can build 40,000 (ie 200 x 200 array) of CPUs within the single reticle limit (ie same as a big NVIDIA) then a big MIMD must be coded with algorithmic patterns like map-reduce, pipelining, etc.

but the general CPU nature and HLL coding means that this is far easier than CUDA to get close to theoretical max performance

[1] or any CSP with both input and output descheduling - ie no queueing