Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thank you.

I used 6 and that dropped the token time to 220ms.

For -ngl, I tried using 24, and then 30 and then 40, and never got to an out of memory error, and got exactly the same token timing, stuck at 220ms.

But, this is very helpful, thank you!



I'm curious whether there's any difference if you try with a longer prompt or ask for a longer completion: https://news.ycombinator.com/item?id=35940365

Also curious to know whether the wall clock time (just prepend your command with 'time ') is any different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: