This demonstrates, at the very least, that it isn't simply memorizing the datapoints. You can see that it is able to smoothly transition between images of Zootopia characters and other characters, which indicates that it has learned a lot more about the actual features.
I believe the prevalence of certain characters (Zootopia, Sonic characters, Pokemon) showing up is because a large portion of the input space maps to those regions of the latent space. So I'd expect there to be a roughly equal proportion of images that look like Nick Wilde in the random samples as there are in the training data.