Community
cancel
Showing results for 
Search instead for 
Did you mean: 
jim_fry
Beginner
225 Views

vmlinux reported as using 98% of CPU_CYCLES

I am using Vtune 3.0 on RedHat 3, Update 5. The 2 processors are Itanium 2's.
I do:
vtl activity run1 -c sampling -o "-ec en=CPU_CYCLES" -d 20 -app ./a.out,"args" run
vtl view | more
The view says that Module vmlinux, Process pid0x0 is taking roughly 98% of the CPU cycles, even though my app is CPU intensive and runs 20 seconds.
When I took the "Tuning for the Intel Itanium 2 Microarchitecture", I didn't have this problem.
I do not have a Windows workstation connected to the Linux box, so I must use the vtl interface.
Any ideas?
Jim Fry
Hewlett-Packard
0 Kudos
15 Replies
225 Views

Nice posting, JF.
Might be interesting to turn calibration on and see if the numbers change any. If memory serves that's adding -cal on or -calibration on in the options setting of the vtl command. Full info at
$ man sampling
Also, don't forget that you can use the plug in viewers on Itanium Linux even though you don't have a full GUI with wizards. To invoke the plug in viewers, just add the -gui option:
$ vtl view # show me my data as ASCII text
$ vtl view -gui # show me the same data in the graphical viewer
cheers
jdg
PS: meant to ask: is the OS you're using listed in the release notes? Their default location is:
/opt/intel/vtune/RELEASENOTES.htm
If you're on an unsupported OS or kernel, you might need to open a premier case and see what engineering has to say about your setup.

Message Edited by jdgallag on 10-27-2005 12:36 PM

Message Edited by jdgallag on 10-27-2005 12:37 PM

jim_fry
Beginner
225 Views

JD,
I tried adding -o "-cal yes", and the problem remained.
The calibration is a good idea (although I probably should set it myself, rather than auto). But the problem remains that vtl is still reporting the vmlinux is taking much more time than it could really be taking, and so the rest of the output is suspect.
Jim Fry
225 Views

Supported OS? Supported kernel?

jdg

jim_fry
Beginner
225 Views

The release notes say Red Hat Enterprise Linux 3.0 is supported, which is what is on this Itanium 2 system. But the kernel is 2.4.21-32.EL, not 2.4.21-4.EL as is listed under the Itanium supported kernels.
Jim Fry
David_Levinthal
Beginner
225 Views

a couple of things...
are you sure the application is actually being run by vtune..you should see its output..(the reason I raise this is that the only timesI haveseen the kernel taking all the cycles is when I "fatfingered" a path, the name of the app..or some such..the path doesn't include the working directory with the required data files....and given my typing..this happens to me a lot..:-)
You can use the remote data collector and the vtune gui on an ia32 box..you don't need a windows box to display the data in the gui..
jim_fry
Beginner
225 Views

levinth,
Yes, I am sure the app is running. I get output, including reports of CPU time consumed.
It would not be easy for me to get access to an IA32 box on the network the Linux box is on.
I think that I'll report this problem through Premier support.
Jim Fry
David_A_Intel1
Employee
225 Views

It sounds to me like what you are seeing is that one of the processors is not being used during the run. Pid 0x0 is the idle process and, unless your app is multi-threaded, your app is probably consuming all of one processor, while the other is sitting in the idle process of the kernel. The missing 2% is the 2% that your app isn't using on the one processor. Using the graphical viewer, you should be able to separate the samples by processor (see CPU button) and make this determination.

Message Edited by DaveA on 11-02-2005 11:33 AM

jim_fry
Beginner
225 Views

DaveA,

I don't think what you are saying can be, because in that case, the sum of all the CPU percentages would approach 200%, not 100%, as I am seeing.

I can't do vtl view -gui. I think it is only supported on IA32 machines.

Jim Fry

225 Views

Absolutely not true.

vtlec = IA32 Linux and EM64T Linux only

vtl view -gui = IA32 Linux, EM64T Linux, and Itanium Linux

1) does vtl view (without the gui option) give results?

2) if so, something is wrong with the X setup on the server, because those plug in viewers are not eclipsed-based, but they sure are X

cheers

jdg

jim_fry
Beginner
225 Views

jdg,
My mistake. I must have fat fingered something the last time I tried.
Now when I use the CPU button, I see there are a total of 26 billion events (CPU cycles) on CPU1 (all processes), but less than 2 billion on CPU 2. Why the disparity? IA64_INST_RETIRED-THIS is 65G vs 4G. It appears that the second CPU is undercounting.
Jim Fry
225 Views

VTune doesn't really count things in a precise geiger counter kind of way, click click click. Its sampling methodology determines statistically relevent information with regard to CPU activity. The numbers you collect in a given experiment can vary for a variety of reasons, even though the data is statistically valid.

If you get a moment, try repeating your experiment and turn calibration on. Is there a report difference? Step two, manually alter the "sample after" value in your collector on the GUI. Compare and let us know what you see?

cheers

jdg

Message Edited by jdgallag on 11-10-2005 10:41 AM

jim_fry
Beginner
225 Views

jdg,

The reason I submitted this in the first place was because thedata were NOT statistically valid. I'm trying to figure out why.

When I turn calibration on, the difference between the 2 CPUs is now a factor of two, rather than 16. That is, numbers of cycle and instruction "events" are twice as big on the first CPU as the second.

I couldn't figure out how to change "sample after" in the GUI. Are you sure that is what you want?

Jim

225 Views

Well said, Jim. Of course since VTune has supported sampling on multiple CPUs for years and years now, successfully, I have a fairly kneejerk reaction to trusting the results I see. (You are wise not to, but I just wanted you to understand my reasons for double triple checking.) Yes, you might be right, about there being a bug, but it seems still to me unlikely.
Don't worry about changing the sample after value for now, although you can do that from the CLI or the GUI.
In general and especially working with highly optimized code, your optimizing compiler can do things to the execution of the code you wrote that you may not expect. I've assumed this may be the case here.
HOWEVER, let's assume for now you're right, and there is a clear bug. I suggest you open a premier case describing the problem, and if you could pack up that project that shows what you're seeing (creating a .tb5 file)and attachthat tb5 file to your case.
Cheers
jdg

Message Edited by jdgallag on 11-11-2005 01:35 PM

Message Edited by jdgallag on 11-11-2005 01:35 PM

jim_fry
Beginner
225 Views

jdg,
I had already been using premier support, as I indicated I would earlier in the discussion thread.
They eventually asked me to try 8.0.
I just had 8.0 installed, and this problem seems to have gone away.
Thanks,
Jim
225 Views

VERY interesting, and thanks for checking back to let the team know.

cheers

jdg

Reply