Software Archive
Read-only legacy content
17060 Discussions

Extremely slow guest/host after VMLaunch

roee_l_
Beginner
830 Views

Hey, I've been trying to run hypervisor on OSX 10.9 for a while and finally managed to set it up. Problem is that the CPU/CPUS I run the hypervisor on become extremely slow after VMLaunch.

Everything seem right, I tested it using a simple user mode application which calls CPUID and the VM Exit handler I wrote successfully handles that. But the overall computer performance decreases significantly. The more cores I run VMLaunch on ,the slower it becomes. When run on all 4 cores I can't even move my mouse. What could it be? 

The only VMExits I get are 'CPUID's which are probably done by the OS or some programs I use, but nothing else. 

I hope that maybe one of you has experimented with it.

 

thanks!

0 Kudos
4 Replies
Bernard
Valued Contributor I
830 Views

I do not have any experience with Mac OSX , but I have some experience with CPU profiling and monitoring. Can launch some CPU monitoring tool which will report per CPU core load and will map that load to specific Process/Thread.

0 Kudos
roee_l_
Beginner
830 Views

Okay, it seems that that the CPU that is under hypervisor NEVER 'hits' l2 & l3 caches. That is most likely why the computer becomes so slow and that also explains why the more cores I put under hypervisor, the slower the computer becomes.

Now for the interesting part, why is the cache disabled? The CR0 registers has its bits 29 & 30 turned off (Cache disable bits) so it can't be that I guess. What else might cause the cache to be off on Vmlaunch? What fields whould I re-examine?

 

thanks

0 Kudos
Bernard
Valued Contributor I
830 Views

Disabling LLC will surely affect all cores. Can you post output of some kind of profiler? It is really hard to know what is consuming CPU cores cycles. One of the reason for the perceived slowness could be an Interrupt Storm.

0 Kudos
roee_l_
Beginner
830 Views

I already the tested the case of Interrupt storm. There's no kernel extension / application that's 'shooting' interrupts in abnormal way. Everything looks ok.

Here's IntelPerformanceCounter log: (Look at the L2, L3 caches)

PRE VMLAUNCH:

EXEC  : instructions per nominal CPU cycle

 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 IO    : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE |  IO   | TEMP

   0    0     0.03   0.60   0.06    0.71    1116 K   1262 K    0.12    0.15    1.48    0.04     N/A     N/A     N/A     52
   1    0     0.02   1.20   0.02    0.98     114 K    183 K    0.38    0.40    0.48    0.06     N/A     N/A     N/A     52
   2    0     0.04   0.77   0.05    0.90     682 K    896 K    0.24    0.21    0.98    0.06     N/A     N/A     N/A     54
   3    0     0.01   0.58   0.01    1.02      95 K    175 K    0.46    0.34    0.54    0.09     N/A     N/A     N/A     54
-----------------------------------------------------------------------------------------------------------------------------
 SKT    0     0.03   0.74   0.03    0.83    2008 K   2517 K    0.20    0.21    1.08    0.06    0.84    0.41    0.18     51
-----------------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.03   0.74   0.03    0.83    2008 K   2517 K    0.20    0.21    1.08    0.06    0.84    0.41    0.18     N/A

 Instructions retired:  248 M ; Active cycles:  335 M ; Time (TSC): 2404 Mticks ; C0 (active,non-halted) core residency: 4.21 %

 C1 core residency: 4.06 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 91.73 %;
 C2 package residency: 10.53 %; C3 package residency: 1.72 %; C6 package residency: 69.96 %; C7 package residency: 0.00 %; C8 package residency: 0.00 %; C9 package residency: 0.00 %; C10 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 1.48 => corresponds to 37.01 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.05 => corresponds to 1.29 % core utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 2.59 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    2.59 Joules

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 IO    : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE |  IO   | TEMP

   0    0     0.02   0.53   0.03    0.54     916 K   1000 K    0.08    0.12    2.27    0.05     N/A     N/A     N/A     53
   1    0     0.00   0.37   0.00    0.66      33 K     47 K    0.30    0.24    0.72    0.07     N/A     N/A     N/A     53
   2    0     0.01   0.60   0.02    0.64     402 K    456 K    0.12    0.14    1.37    0.04     N/A     N/A     N/A     55
   3    0     0.01   0.89   0.01    0.86      56 K    113 K    0.50    0.35    0.53    0.11     N/A     N/A     N/A     55
-----------------------------------------------------------------------------------------------------------------------------
 SKT    0     0.01   0.59   0.02    0.61    1408 K   1617 K    0.13    0.15    1.66    0.06    0.25    0.06    0.10     51
-----------------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.01   0.59   0.02    0.61    1408 K   1617 K    0.13    0.15    1.66    0.06    0.25    0.06    0.10     N/A

 Instructions retired:   90 M ; Active cycles:  152 M ; Time (TSC): 2408 Mticks ; C0 (active,non-halted) core residency: 2.62 %

 C1 core residency: 3.10 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 94.28 %;
 C2 package residency: 10.61 %; C3 package residency: 1.39 %; C6 package residency: 37.66 %; C7 package residency: 39.44 %; C8 package residency: 0.00 %; C9 package residency: 0.00 %; C10 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 1.19 => corresponds to 29.63 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.02 => corresponds to 0.47 % core utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 1.10 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    1.10 Joules

 

POST Vmlaunch on CORE 0:

 

EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 IO    : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE |  IO   | TEMP

   0    0     0.00   0.00   1.21    1.21    5490 K   5492 K    0.00    0.00    0.33    0.00     N/A     N/A     N/A     19
   1    0     0.04   0.03   1.21    1.21      73 K     84 K    0.13    0.10    0.00    0.00     N/A     N/A     N/A     20
   2    0     0.94   0.78   1.21    1.21     110 K    227 K    0.51    0.52    0.01    0.00     N/A     N/A     N/A     18
   3    0     0.83   0.68   1.23    1.21      45 K     85 K    0.47    0.54    0.00    0.00     N/A     N/A     N/A     19
-----------------------------------------------------------------------------------------------------------------------------
 SKT    0     0.45   0.37   1.21    1.21    5719 K   5890 K    0.03    0.06    0.09    0.00    0.60    0.06    0.12     19
-----------------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.45   0.37   1.21    1.21    5719 K   5890 K    0.03    0.06    0.09    0.00    0.60    0.06    0.12     N/A

 Instructions retired: 4498 M ; Active cycles:   12 G ; Time (TSC): 2478 Mticks ; C0 (active,non-halted) core residency: 100.34 %

 C1 core residency: 0.00 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %;
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; C8 package residency: 0.00 %; C9 package residency: 0.00 %; C10 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 0.75 => corresponds to 18.63 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.90 => corresponds to 22.59 % core utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 13.82 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    13.82 Joules

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 IO    : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE |  IO   | TEMP

   0    0     0.00   0.00   1.21    1.21    5497 K   5499 K    0.00    0.00    0.33    0.00     N/A     N/A     N/A     19
   1    0     0.03   0.02   1.21    1.21      39 K     45 K    0.15    0.10    0.00    0.00     N/A     N/A     N/A     19
   2    0     1.53   1.27   1.20    1.21      74 K     91 K    0.18    0.67    0.00    0.00     N/A     N/A     N/A     18
   3    0     0.82   0.68   1.21    1.21      59 K     95 K    0.37    0.47    0.00    0.00     N/A     N/A     N/A     18
-----------------------------------------------------------------------------------------------------------------------------
 SKT    0     0.60   0.49   1.21    1.21    5670 K   5731 K    0.01    0.05    0.09    0.00    0.59    0.06    0.12     18
-----------------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.60   0.49   1.21    1.21    5670 K   5731 K    0.01    0.05    0.09    0.00    0.59    0.06    0.12     N/A

 Instructions retired: 5871 M ; Active cycles:   11 G ; Time (TSC): 2473 Mticks ; C0 (active,non-halted) core residency: 99.87 %

 C1 core residency: 0.13 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %;
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %; C8 package residency: 0.00 %; C9 package residency: 0.00 %; C10 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 0.99 => corresponds to 24.70 % utilization for cores in active state
 Instructions per nominal CPU cycle: 1.19 => corresponds to 29.81 % core utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 14.71 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    14.71 Joules

 

 

As you can see, it seems that core0 NEVER hits the l2 and l3 cache post vmlaunch.

If you can see anything else that is wrong i would love to hear.

 

thanks

0 Kudos
Reply