Looks like there are several projects that implement the CUDA interface for vari...

westurner · on May 14, 2023

"Democratizing AI with PyTorch Foundation and ROCm™ support for PyTorch" (2023) https://pytorch.org/blog/democratizing-ai-with-pytorch/ :

> AMD, along with key PyTorch codebase developers (including those at Meta AI), delivered a set of updates to the ROCm™ open software ecosystem that brings stable support for AMD Instinct™ accelerators as well as many Radeon™ GPUs. This now gives PyTorch developers the ability to build their next great AI solutions leveraging AMD GPU accelerators & ROCm. The support from PyTorch community in identifying gaps, prioritizing key updates, providing feedback for performance optimizing and supporting our journey from “Beta” to “Stable” was immensely helpful and we deeply appreciate the strong collaboration between the two teams at AMD and PyTorch. The move for ROCm support from “Beta” to “Stable” came in the PyTorch 1.12 release (June 2022)

> [...] PyTorch ecosystem libraries like TorchText (Text classification), TorchRec (libraries for recommender systems - RecSys), TorchVision (Computer Vision), TorchAudio (audio and signal processing) are fully supported since ROCm 5.1 and upstreamed with PyTorch 1.12.

> Key libraries provided with the ROCm software stack including MIOpen (Convolution models), RCCL (ROCm Collective Communications) and rocBLAS (BLAS for transformers) were further optimized to offer new potential efficiencies and higher performance.

https://news.ycombinator.com/item?id=34399633 :

>> AMD ROcm supports Pytorch, TensorFlow, MlOpen, rocBLAS on NVIDIA and AMD GPUs: https://rocmdocs.amd.com/en/latest/Deep_learning/Deep-learni...

westurner · on May 14, 2023

https://github.com/intel/intel-extension-for-pytorch :

> Intel® Extension for PyTorch extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch xpu device, Intel® Extension for PyTorch provides easy GPU acceleration for Intel discrete GPUs with PyTorch

https://pytorch.org/blog/celebrate-pytorch-2.0/ (2023) :

> As part of the PyTorch 2.0 compilation stack, TorchInductor CPU backend optimization brings notable performance improvements via graph compilation over the PyTorch eager mode.

> The TorchInductor CPU backend is sped up by leveraging the technologies from the Intel® Extension for PyTorch for Conv/GEMM ops with post-op fusion and weight prepacking, and PyTorch ATen CPU kernels for memory-bound ops with explicit vectorization on top of OpenMP-based thread parallelization

DLRS Deep Learning Reference Stack: https://intel.github.io/stacks/dlrs/index.html

rain1 · on May 14, 2023

exciting! maybe we will see that land in llama.cpp eventually, who knows!

juliangoldsmith · on May 14, 2023

llama.cpp has CLBlast support now, though I haven't used it yet.