Another thing to check is how nginx was compiled. Using generic optimizations vs...

theevilsharpie · on Oct 29, 2021

A binary running in a VM is still executing native machine code, so compiler optimizations should have the same effect whether running on bare metal or a VM.

Bender · on Oct 29, 2021

Should being the key word. In truth the implementations of each hypervisor vary. Try it on each hypervisor that you use. I found KVM to have the most parity to bare metal performance.

theevilsharpie · on Oct 29, 2021

I'm struggling to think of a situation when running virtualized vs. bare metal where compiler optimizations would matter.

Certain hypervisors have the ability to disable features on the virtual CPU to enable live migration between different generation physical CPUs, in which case a binary that depends on a disabled virtual CPU feature (a.g., AVX-512) will simply crash (or otherwise fail) when it executes an unsupported instruction.

Other than that, I'm drawing a blank.

Hypervisor performance will vary, but I can't envision any scenario where a binary optimized for the processor's architecture would perform worse than one without any optimizations when running on a VM vs bare metal.

Bender · on Oct 29, 2021

That's the point. It shouldn't matter but in fact it does. You can see this for yourself if you use the x86_64 optimizations on VM's in benchmark tests. And you will see various results depending on hypervisor used and what application is used. This will even change with time as updates are made to each hypervisor. What I am describing is exactly what is not supposed to happen which is why you are struggling to think of a situation where this should matter. You are being entirely logical.

altfredd · on Oct 30, 2021

IIRC, hypervisors have to preserve CPU registers and other processor-related state when switching between "worlds". This is why mitigations for CPU vulnerabilities affects hypervisors too.

Most compilers assume that emitting the code in certain modes (SSE/AVX etc.) have particular cost. That cost may drastically change depending on how the implementation of the hypervisors handles the registers in question.

mr_toad · on Oct 30, 2021

The point of a hypervisor is that any instruction can potentially be trapped and either emulated or substituted with others. If your application uses a lot of instructions that get trapped and end up using slower emulation it will hurt performance.

DSingularity · on Oct 30, 2021

Yeah but most the instructions that get emulated are not used in applications. They are instructions things that operating systems do like sending interrupts to physical cores (which ofc need the hyper visor).