Lol what. I have a bot that processes ~50 videos a day, burning in translated whisper-generated subtitles. It also translates images using Tesseract, then overlaying texts in-place. I once thought of exporting frames as images to maybe do this for video too, I actually did not even start to think FFMPEG would have tesseract support on top of everything.
Later on though I've realized the quality of tesseract's OCR on arbitrary media is often quite bad. Google translates detection and replacement is so much ahead my current image system I'd think I would just somehow reutilize that for my app, either thru public API or browser emulation ...