I was surprised by how poorly poised Intel was to act on the "Cambrian explosion...

version_five · on Sept 25, 2023

Maybe I'm thinking about it too simply but yeah I agree.

Language models in particular are very similar architectures and effectively a lot of dot products. And running them on GPU's is arguably overkill. Look at llama.cpp for the way the industry is going. I want a fast parallel quantized dot product instruction on a CPU, and I want the memory bandwidth to keep it loaded up. Intel should be able to deliver that, with none of the horrible baggage that comes from CUDA and nvidia drivers.

CoastalCoder · on Sept 26, 2023

Does Intel have a credibility problem w.r.t. ISA extensions to support deep learning?

I'm thinking about the widespread confusion they caused by having different CPUs support different subsets of the AVX-512 ISA.

refulgentis · on Sept 26, 2023

This reads like parody (from llama.cpp, to it being a beacon of where industry is going (!?), to GPUs are overkill for what is effectively a lot of dot products)

NavinF · on Sept 26, 2023

Yeah using CPUs for inference or training is ridiculous. We're talking 1/20th the performance for 1/4th the energy

pjmlp · on Sept 26, 2023

The reason CUDA has won is precisily because it isn't horribly stuck in a C dialect, have embraced polyglot workloads since 2010, have a great developer experience where GPUs can be debugged like regular CPUs, and the library ecosystem.

Now while NVidia is making standard C++ run on CUDA, Intel is still having SYSCL and oneAPI extensions.

Similarly with Python and RAPIDS framework.

Intel and AMD have to up their game for the same kind of developer experience.

fransje26 · on Sept 26, 2023

Err.. Last time I checked, CUDA was the one with the partially compliant C++ implementation, while, on the contrary, SYSCL was being base on pure C++17..

pjmlp · on Sept 26, 2023

Time to check again, as CUDA is C++20 for a bit now (minus modules), and NVidia is the one driving the senders/receivers work for C++26, based on their CUDA libraries.

SYCL isn't pure C++, meaning writing STL code that goes into the GPU, like CUDA allows for, nor requires the hardware to follow C++ memory model.