Hmm, I wonder if anyone has a simple pipeline for extracting data for "voice cloning" type models from the combination of original audio and transcribed text. It should be possible to chain this with some post-processing to replace Lex's voice with something more pleasing, and maybe throw in some automated rewriting of the transcript to remove the fluff.