ow5's comments

ow5 · 2025-04-25T20:39:10 1745613550

Hi! one of the contributors to the paper — we have kernels not released yet that can shave down decoding latency by >20%.

Also when we ran experiments for streaming with the current kernels, we were median ~1.3x slower at inference

ein0p · 2025-04-25T21:34:42 1745616882

Thanks for chiming in! How do you explain the top-most graph in Figure 5? Am I misreading it?