I wrote up this problem report, and then after some more tinkering around, discovered I could avoid the problem by turning off calibration, adding "-cal no -si 1" to my vtl command line.
I thought I'd report the problem, in case anyone knows more about the problem, or in case anyone else is seeing a similar problem and wishing they had a workaround.
VTUNE 1.1 reports different CPU utilization than sar(8)
Environment: HW: 4-way Xeon (hyper-threaded 8-way) IBM440 OS: SLES 8, kernel 2.4.19-64GB-SMP SW: VTune 1.1 for Linux
Activity: I've been comparing the scaling of a fabric I/O-bound application when multiple processes are executed.
When I run one copy of the process, I run it directly under vtl vtl -d 60 -c sampling -app ,""
When I run multiple copies of the process, I start all but one instance from a tight shell loop and put them in the background, and then run the last one under vtl, as above.
In either case, I would also run sar 10 7 in the background while the vtl was executing.
Problem: With 1 instance of : - reports throughput of io/sec - sar reports: 4% usr, 4% system, 92% idle - vtune reports 70% of event samples in the function default_idle in the vmlinux module
With 20 instances of : - reports throughput of <3X> io/sec - sar reports: 23% usr, 77% system, 0% idle - vmtune reports 70% of event samples in the function default_idle in the vmlinux module.
The vtl command I'm using to view the results is vtl view aXX::r1 -ha -mn vmlinux -sd /usr/src/linux where XX are reported by view show as aXX_
Why would vtune report such a different cpu usage than sar?
I never did figure out if the fact that calibration was being used was the problem, or if the fact that calibration was being used caused the a time period mismatched between vtl and sar. At any rate "-cal no" now produces vtl idle reports similar to sar.
Very thorough: thanks for posting such an interesting experiment and results.
In a nutshell: VTune doesn't count anything, like for example a geiger counter keeps precise count of hits: tick tick tick tick tick.
VTune tracks statistically significant information about processor events which actually occurred, and it uses a sampling technology to do so.
Aside from the fact VTune doesn't keep an exact count, it does a pretty darn good job of pointing you to where the processor was spending most of its time during your sampling session.
And you'd expect differences from more than just sar. Try this experiment: run a 20-second vtl session launching no application (leave -app out, keep it simple), and at the same time run a "ps -ef" and then "ps -ef| wc -l".
The ps command is showing you an exact count, everything that is listed in the process run table whether the cpu is running that process at the time or not. VTune is only going to show you statistically relevant information about events that the processor actually ran during that same time period.
The lists are showing two different things, on purpose and by design. And this model applies directly to other commands such as sar, which count precisely (as opposed to "sample").
That said, we would never expect VTune to count events in a precise way, it's just not designed to do that. Does it still work like a champ? You bet.
Also, please note that the engineering team is very seriously considering turning the calibration default to OFF instead of ON in the next release of vtl (2.0, currently in beta), since it does seem to occasionally cause confusion, and can always be turned back on when needed.