Thanks to you and the other commenters!

I just want to leave this breadcrumb showing possible markets and applications for high-performance computing (HPC), specifically regarding SpiNNaker which is simulating neural nets (NNs) as processes communicating via spike trains rather than matrices performing gradient descent:

https://news.ycombinator.com/item?id=44201812 (Sandia turns on brain-like storage-free supercomputer)

https://blocksandfiles.com/2025/06/06/sandia-turns-on-brain-... (working implementation of 175,000 cores)

https://www.theregister.com/2017/10/19/steve_furber_arm_brai... (towards 1 million+ cores)

https://www.youtube.com/watch?v=z1_gE_ugEgE (518,400 cores as of 2016)

https://arxiv.org/pdf/1911.02385 (towards 10 million+ cores)

https://docs.hpc.gwdg.de/services/neuromorphic-computing/spi... (HPC programming model)

I'd use a similar approach but probably add custom memory controllers that calculate hashes for a unified content-addressable memory, so that arbitrary network topologies can be used. That way the computer could be expanded as necessary and run over the internet without modification. I'd also write something like a microkernel to expose the cores and memory as a unified desktop computing environment, then write the Python HPC programming model over that and make it optional. Then users could orchestrate the bare metal however they wish with containers, forked processes, etc.

A possible threat to the HPC market would be to emulate MIMD under SIMD by breaking ordinary imperative machine code up into parallelizable immutable (functional) sections bordered by IO handled by some kind of monadic or one-shot logic that prepares inputs and obtains outputs between the functional portions. That way individual neurons, agents for genetic algorithms, etc could be written in C-style or Lisp-style code that's transpiled to run on SIMD GPUs. This is an open problem that I'm having trouble finding published papers for:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4611137 (has PDF preview and download)

Without code examples, I'd estimate MIMD->SIMD performance to be between 1-2 orders of magnitude faster than a single-threaded CPU and 1-2 orders of magnitude slower than a GPU. Similar to scripting languages vs native code. My spidey sense is picking up so many code smells around this approach though that I suspect it may never be viable.

I'd compare the current complexities around LLMs running on SIMD GPUs to trying to implement business logic as a spaghetti of state machines instead of coroutines running conditional logic and higher-order methods via message passing. Loosely that means that LLMs will have trouble evolving and programming their own learning models. Whereas HPC doesn't have those limitations, because potentially every neuron can learn and evolve on its own like in the real world.

So a possible bridge between MIMD and SIMD would be to transpile CPU machine code coroutines to GPU shader state machines:

https://news.ycombinator.com/item?id=18704547

https://eli.thegreenplace.net/2009/08/29/co-routines-as-an-a...

In the end, they're equivalent. But a multi-page LLM specification could be reduced down to a bunch of one-liners because we can reason about coroutines at a higher level of abstraction than state machines.