Am I reading this wrong, or does this only support FP16 inputs, and compares its... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		stonogo 55 days ago \| parent \| context \| favorite \| on: CUDA-l2: Surpassing cuBLAS performance for matrix ... Am I reading this wrong, or does this only support FP16 inputs, and compares its performance against an FP32 solver?

Bulat_Ziganshin 54 days ago [–]

They compare HGEMM implementations. At least CUBLAS has HGEMM functions.

HGEMM means half-precision (i.e. FP16) general matrix multiplication

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact