I got better performance of 20.18 tokens per second using tinyllama-1.1b-chat-v1...

		senkora on Sept 23, 2024 \| parent \| context \| favorite \| on: Forget ChatGPT: why researchers now run small AIs ... I got better performance of 20.18 tokens per second using tinyllama-1.1b-chat-v1.0.Q8_0.llamafile from https://huggingface.co/Bojun-Feng/TinyLlama-1.1B-Chat-v1.0-l... If anyone is reading this and had trouble with a larger model, that might be the one to try next.