Given that OpenAI Whisper is open source now, and pretty near SOTA, I think crea...

thundergolfer · on Nov 1, 2022

Yes it does seem within reach. Even just the base-en whisper modal with 74M params performs remarkably well on transcription (the large model has 1550M params!).

Compare a base-en whisper transcription to a human transcription. This is the A Powerful Theory of Why the Far Right Is Thriving Across the Globe latest episode of the Ezra Klein Show, transcribed just now:

1. [whisper] https://modal-labs-whisper-pod-transcriber-fastapi-app.modal...

2. [human] https://www.nytimes.com/2022/11/01/opinion/ezra-klein-podcas...

(You can use the same app to transcribe other episodes on-demand and see the base-en model's quality)

dsiroker · on Nov 1, 2022

I got OpenAI Whisper running locally on my Mac but the plumbing to make it NOT tax system resources (like CPU) and to get it to work with search isn't trivial. It's on our roadmap.

ggerganov · on Nov 1, 2022

You might find my inference implementation of Whisper useful [0]. It has a C-style API that allows for easy integration in other projects and you can control how many CPU threads to be used during the processing.

[0] https://github.com/ggerganov/whisper.cpp