Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry, I have not benchmarked against cuBLAS or Eigen or similar, I did that thing for ML inference.

I have implemented a profiler on top of D3D11_QUERY_TIMESTAMP and D3D11_QUERY_TIMESTAMP_DISJOINT queries, and tweaked the compute shader to minimize the time reported by these queries for my specific use case.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: