Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't find llama3.1 noticeably worse on 8 bit integer quantised than the original fp16 to be honest. It's also a lot faster.

Of course even then you're not going to reach the whole 128k context window on 16GB but if you don't need that it works great.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: