And the bytecode is just calling polymorphic methods. All the real work is done in the object implementations of type(x). I was very bummed years ago to realize how shallow the bytecode representation in Python is. There is no sub-terpreter, just C.
So, this is going to be really slow inside a loop. Would the compiler be able to optimize it into a single multiply instruction if it could prove that the input had to contain integers?