Maybe I misunderstand something, but if x -> f(x) is easy to compute but you want to learn f(x) -> x, then isn't synthetic data exactly what you want to be using?
Example: Training an image upscaling algorithm by feeding it downscaled images. In this case you don't even need to train a generative model (the algorithm is known), but it should illustrate that the generative task can be extremely easy compared to the target task. You can't just handwave that away with "just divide by P(X)".
Example: Training an image upscaling algorithm by feeding it downscaled images. In this case you don't even need to train a generative model (the algorithm is known), but it should illustrate that the generative task can be extremely easy compared to the target task. You can't just handwave that away with "just divide by P(X)".