We've been evaluating VTune 8.0.3 on a RedHat Enterprise 4 (update 4) system, the kernel we are running is 2.6.9-42.0.2. The machine in question is an HP DL380 G5 with dual Xeon 5160 (dual core, 3.00 GHz) processors, 4gb RAM. We are using system sampling to sample all running processes on the machine. Everything appears to be more or less working correctly.
The problem is that once we run a sampling activity on the box all processing afterwards is degraded until we reboot the machine. For example we see the following:
- boot machine - run process A, it runs for 16 seconds (this is normal) - run process A, 16 seconds - run process A, 16 seconds - launch VTune, start sampling for 30 second window - run process A, 16 seconds - stop sampling - run process A, 23 seconds - run process A, 23 seconds - start sampling - run process A, 16 seconds - stop sampling - run process A, 23 seconds - stop VTune application - run process A, 23 seconds - remove VTune kernel module - run process A, 23 seconds - reboot - run process A, 16 seconds - run process A, 16 seconds
In all cases we can see using top, vmstat, or iostat that this process is completely CPU bound, is mostly running on one CPU only, and that there is very little disk I/O. In otherwords their profile seems identical other than that in the slow (23 second case) it just takes a lot longer to complete. The output values are correct and identical in both cases (and we have multiple complex metrics, all of which come out identical).
As you can see, it's almost like VTune is making some changes to the CPU or CPU kernel interface when it launches sampling and is failing to restore them to the initial state upon exit. This seems farfetched but it's the best theory we have right now that can account for this behavior and considering that this software interacts with the CPU at a pretty low level I guess it's possible.
Can anyone help us figure out what's going on here and how to fix it?