Solved: Source Code Inspection, Strange CPU Time Reporting

Divino_C_ · ‎01-24-2014

I'm using VTune Update 15 to profile the benchmark 401.Bzip2 from SPEC CPU 2006 benchmark suite, however I'm seeing a strange result when inspecting the source code to check which instructions are more time consuming. I really don't know how the result I'm seeing can be correct.

The problem is:

1) I use Advanced Hotspot Analysis to profile the code;

2) I go to the "Top-down Tree" tab and expand the main function; Which I see that is running for ~86s and is calling to other functions: spec_compress and spec_uncompress; So I double-click on the main function to check its source code;

3) Inspecting the source code of the main function I note that VTune claims that the line "spec_fd[1].limit=compressed_size*MB;" is executing for ~86s... the total execution time of the benchmark; I also note that in the lines corresponding to the invocations of the functions spec_compress/uncompress the Cpu time is shown as zero.

I've attached some screenshots. I would greatly appreciate any help on this!

David_A_Intel1 · ‎01-24-2014

Hi Divino:

Did you build with optimizations turned on? Did you view the assembly code for the function in question (i.e., while in the source view)? Optimization changes code layout (inlining, etc.) and I suggest viewing the assembly with the collected data (see Assembly button in upper left of source view).

View solution in original post

David_A_Intel1 · ‎01-24-2014

Hi Divino:

Did you build with optimizations turned on? Did you view the assembly code for the function in question (i.e., while in the source view)? Optimization changes code layout (inlining, etc.) and I suggest viewing the assembly with the collected data (see Assembly button in upper left of source view).

Divino_C_ · ‎01-25-2014

Hello Mr. Anderson, how are you?

Yes, I profiled an optimized code.. It seems that the optimization was exactly the source of the problem. Thanks for reminding me of this issue.

So, if I want VTune to profile the code, my options are: 1) to profile the non-optimized code; 2) analyse the results using the assembly view?

TimP · ‎01-25-2014

Did you consider setting -inline-debug-info or options like -no-ip or -fno-inline-functions?

Divino_C_ · ‎01-25-2014

Actually I'm using GCC. What does the flag "-no-ip" disable?

Caesar · ‎02-04-2014

Hi,

I'm sorry to come back to this question again, but today I saw a some strange results again.

This time I'm using intel compiler (version 13.1.3) with these flags: "-O3 -debug inline-debug-info -no-ip -fno-inline-functions" and VTune is reporting that a call instruction in the source is being responsible for ~140s of the program execution (tot. execution time is ~350s)... this seems a little odd. Can you guys tell me what I'm doing wrong?

I've attached a screenshot of the result window.

TimP · ‎02-04-2014

gcc doesn't share the debug-info options of icc, but -fno-inline-functions (icc linux) was taken directly from gcc. -Qip- is the ICL windows equivalent to disable in-lining.

Caesar · ‎02-04-2014

Thanks for clarifying that. I've changed my experiments to use ICC.

Bernard · ‎02-07-2014

Usually when building a benchmark do it without the optimization.The simplest example is looped N times call to library sin() function with the argument being constant value.In such a case optimizing compliler can evaluate at compile time the library call and calculate the value ahead of runtime thus eliminating the need to even execute that loop N times.