Also when we ran experiments for streaming with the current kernels, we were median ~1.3x slower at inference
Also when we ran experiments for streaming with the current kernels, we were median ~1.3x slower at inference