I use Tortoise TTS. It's slow, a little clunky, and sometimes the output gets downright weird. But it's the best quality-oriented TTS I've found that I can run locally.
It's allegedly the basis of the tech used by Eleven Labs.
There are faster implementations of tortoise that allow fine-tuning. You can get close to ElevenLabs quality if you have a perfect dataset. https://git.ecker.tech/mrq/ai-voice-cloning
It's allegedly the basis of the tech used by Eleven Labs.
https://github.com/neonbjb/tortoise-tts