Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Discrepancy in CPU Time Accounting for KVM Guests

nickohatz55
Beginner
121 Views

Hello,

I am an undergraduate student working on a research project in virtualization.

I'm encountering unusual CPU accounting behavior when running KVM guests on a newer Intel Xeon platform, and I'm hoping someone can help me understand what might be causing this difference.

I have two HPE servers running identical kernel configurations (Ubuntu 6.15, same .config). When I run a purely CPU-bound workload (sysbench cpu) in a KVM guest, the older Gen10 (CPU MODEL: INTEL(R) XEON(R) GOLD 6138) system shows ~100% user time for vCPU threads on the host, while the newer Gen11 (CPU MODEL: INTEL(R) XEON(R) GOLD 6554S) system shows ~50% user / 50% system for the same vCPU threads. The sysbench performance is identical in throughput when run in the guest vs on the host in both servers, indicating that the hypervisor causing up to 50% overhead is misleading.

I've examined several variables to isolate the issue. Both hosts are running the same kernel version and configuration. Some drivers I have tinkered with are setting CONFIG_TICK_CPU_ACCOUNTING=y, HZ=250, and NO_HZ_FULL disabled. Some other relevant confirmations I made are that the hosts are running the same QEMU/libvirt versions, both have SMT disabled, and the VM’s CPU mode is set to be pass-through. The BIOS power settings (C-states, Turbo) are also identical (both disabled). The only observed difference is that when I measured vmexits with perf, the Gen10 system shows around 25k vmexits per second, while the Gen11 shows around 40k per second. 

I am wondering what exactly is happening to cause this time accounting difference and how to fix it (show the rightful 100% user time for the vCPU threads on the host) if possible.

I can provide full system specs, kernel config details, and KVM parameters if that would be helpful. Any insight into what might be causing this shift in CPU time attribution would be greatly appreciated.

Thanks,
Nicko Hatz

 

0 Kudos
0 Replies
Reply