Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
stonogo
55 days ago
|
parent
|
context
|
favorite
| on:
CUDA-l2: Surpassing cuBLAS performance for matrix ...
Am I reading this wrong, or does this only support FP16 inputs, and compares its performance against an FP32 solver?
Bulat_Ziganshin
54 days ago
[–]
They compare HGEMM implementations. At least CUBLAS has HGEMM functions.
HGEMM means half-precision (i.e. FP16) general matrix multiplication
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: