Recommend to encode sub-audible tracking symbols in synthesize speech patterns that includes GPS, IP, timestamp, country of origin. We do that already in bootleg movies so we can apply similar methods in synthesizes speech.
Hmm, I don't know about that. The data rate used by civilian GPS (L1 C/A) is only 50 bps. The symbols are normally spread over a couple MHz of bandwidth to make it possible to recover at levels below the thermal noise floor. I see no reason why the same thing couldn't be done at baseband, adding an imperceptible bit of extra noise to an audio signal.
Of course, you wouldn't encode real-time navigation data, but a small block of identifying text. Either way, though, someone without a copy of the spreading code isn't going to notice it or decode it. Given enough redundancy in both the time and frequency domains, removing it wouldn't be easy either.
The real problem is that bad actors would simply encode some other person's coordinates/metadata into the recordings they produce, and we'll have been trained by then to blindly accept the presence of these markers as strong evidence of guilt.