I'm using this: https://github.com/guillaumekln/faster-whisper Smaller, faster, ...

regularfry · on Nov 1, 2023

That's just using the original model with a faster runtime. It's limited by the model itself, as is ggerganov/whisper.cpp. This changes the model.

asylteltine · on Nov 1, 2023

So it is possible to combine that with distil for extra speed?

kkielhofner · on Nov 1, 2023

I'm the founder of Willow[0] (we use ctranslate2 as well) and I will be looking at this as soon tomorrow as these models are released. HF claims they're drop-in compatible but we won't know for sure until someone looks at it.

[0] - https://heywillow.io/

stavros · on Nov 1, 2023

I have to say I love Willow, well done. It's a bit slow now, because I'm not running recognition locally (as I'm sure many people aren't), but it will be fantastic news if this helps me offload recognition onto my NUC (ie CPU-only) and can shave lots of ms off that way.

kkielhofner · on Nov 1, 2023

Thanks!

I'll be looking at this as soon as it is released tomorrow.

Separately, we have some Willow Inference Server improvements in the works that increase the speed of speech recognition on CPU by as much as 50% (depending on CPU supported instruction sets, etc).

Between that, the performance we already have, and this work it will be a dramatic improvement - even on CPU. I'm really looking forward to posting the benchmarks when all of this comes together!

stavros · on Nov 1, 2023

That's excellent news, that'll be great! I'm looking forward to that.

rjwilmsi · on Nov 1, 2023

That's the implication. If the distil models are same format as original openai models then the Distil models can be converted for faster-whisper use as per the conversion instructions on https://github.com/guillaumekln/faster-whisper/

So then we'll see whether we get the 6x model speedup on top of the stated 4x faster-whisper code speedup, at same/nearly same accuracy.

I would generally start with the assumption that if something is significantly faster the accuracy has to suffer a bit, but increasing model size and/or settings such as beam size to compensate should allow same accuracy and higher performance (just not all of the stated performance gain).

kkielhofner · on Nov 1, 2023

Just a point of clarification - faster-whisper references it but ctranslate2[0] is what's really doing the magic here.

Ctranslate2 is a sleeper powerhouse project that enables a lot. They should be up front and center and get the credit they deserve.

[0] - https://github.com/OpenNMT/CTranslate2

srush · on Nov 1, 2023

Yup, should work nicely together.

worldsavior · on Nov 1, 2023

If it's faster, why openai doesn't implement it?

MacsHeadroom · on Nov 1, 2023

Because OpenAI focuses on putting out quality models. Efficient execution of ML models is another skill set entirely. Projects like CTranslate2 (which is what faster-whisper uses) are focused on fast model execution and work across all kinds of models from speech recognition to image and speech generation and everything in between.

HappMacDonald · on Nov 3, 2023

Also because OpenAI benefits from a certain measure of inefficiency to prevent models from being easy for the masses to run without them being in the loop extracting money as well as compiling new training data out of every inference that users feed them.