Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you replace a tokenizer with 5 bytes per token on average by a byte-level representation, you now need 5 times as much memory and (depending on the specifics of the attention mechanism) 11 to 25 times as much compute.

At the scales we're talking about, that's quite a hefty price to pay, and it doesn't even take into account that you might need more layers to replace the processing that was implicitly done by the tokenizer.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: