Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For small values of N, the linear terms of the transformer dominate. At the end of the day, a double layer of 764*2048 is still north of 3.1 MM flops/token/layer.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: