I don't find llama3.1 noticeably worse on 8 bit integer quantised than the origi...

		wkat4242 on Aug 27, 2024 \| parent \| context \| favorite \| on: Cerebras launches inference for Llama 3.1; benchma... I don't find llama3.1 noticeably worse on 8 bit integer quantised than the original fp16 to be honest. It's also a lot faster. Of course even then you're not going to reach the whole 128k context window on 16GB but if you don't need that it works great.