Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
5261 Discussions

Source Code Inspection, Strange CPU Time Reporting

Divino_C_
New Contributor I
1,975 Views

I'm using VTune Update 15 to profile the benchmark 401.Bzip2 from SPEC CPU 2006 benchmark suite, however I'm seeing a strange result when inspecting the source code to check which instructions are more time consuming. I really don't know how the result I'm seeing can be correct.

The problem is:

1) I use Advanced Hotspot Analysis to profile the code;

2) I go to the "Top-down Tree" tab and expand the main function; Which I see that is running for ~86s and is calling to other functions: spec_compress and spec_uncompress; So I double-click on the main function to check its source code;

3) Inspecting the source code of the main function I note that VTune claims that the line "spec_fd[1].limit=compressed_size*MB;" is executing for ~86s... the total execution time of the benchmark; I also note that in the lines corresponding to the invocations of the functions spec_compress/uncompress the Cpu time is shown as zero.

I've attached some screenshots. I would greatly appreciate any help on this!

 

0 Kudos
1 Solution
David_A_Intel1
Employee
1,975 Views

Hi Divino:

Did you build with optimizations turned on?  Did you view the assembly code for the function in question (i.e., while in the source view)?  Optimization changes code layout (inlining, etc.) and I suggest viewing the assembly with the collected data (see Assembly button in upper left of source view).

View solution in original post

0 Kudos
8 Replies
David_A_Intel1
Employee
1,976 Views

Hi Divino:

Did you build with optimizations turned on?  Did you view the assembly code for the function in question (i.e., while in the source view)?  Optimization changes code layout (inlining, etc.) and I suggest viewing the assembly with the collected data (see Assembly button in upper left of source view).

0 Kudos
Divino_C_
New Contributor I
1,975 Views

Hello Mr. Anderson, how are you?

Yes, I profiled an optimized code.. It seems that the optimization was exactly the source of the problem. Thanks for reminding me of this issue.

So, if I want VTune to profile the code, my options are: 1) to profile the non-optimized code; 2) analyse the results using the assembly view?

 

0 Kudos
TimP
Honored Contributor III
1,975 Views

Did you consider setting -inline-debug-info or options like -no-ip or -fno-inline-functions?

0 Kudos
Divino_C_
New Contributor I
1,975 Views

Actually I'm using GCC. What does the flag "-no-ip" disable?

0 Kudos
Caesar
Beginner
1,975 Views

Hi,

I'm sorry to come back to this question again, but today I saw a some strange results again.

This time I'm using intel compiler (version 13.1.3) with these flags: "-O3 -debug inline-debug-info -no-ip -fno-inline-functions" and VTune is reporting that a call instruction in the source is being responsible for ~140s of the program execution (tot. execution time is ~350s)... this seems a little odd. Can you guys tell me what I'm doing wrong?

I've attached a screenshot of the result window.

0 Kudos
TimP
Honored Contributor III
1,975 Views

gcc doesn't share the debug-info options of icc, but -fno-inline-functions (icc linux) was taken directly from gcc.  -Qip- is the ICL windows equivalent to disable in-lining.

0 Kudos
Caesar
Beginner
1,975 Views

Thanks for clarifying that. I've changed my experiments to use ICC.

0 Kudos
Bernard
Valued Contributor I
1,975 Views

Usually when building a benchmark do it without the optimization.The simplest example is looped N times call to library sin() function with the argument being constant value.In such a case optimizing compliler can evaluate at compile time the library call and calculate the value ahead of runtime thus eliminating the need to even execute that loop N times.

0 Kudos
Reply