Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

gprof vs difftime


I have been compiling a fairly large Fortran application
with Intel v8.1 compiler using OMP directives for parallelism on
dual socket IA-32 machines.

All subroutines (a few are in ANSI C) have been instrumented
with -pg compile switch so that I can run gprof later to check
on performance. According to the "flat profile" in the gprof
output, the code runs twice as fast when using two processors
compared to with a single processor.

However, the code also calls difftime (from C) to give me the
elapsed execution time for the software. Comparing the elapsed
times gives me only a 30% improvement in speed. According to
gprof, I should expect a 50% improvement.

I don't understand why the two methods of clocking the
parallelised software are so different.

It does not seem likely that it is due to OMP overhead, because
gprof measures the time spent in each subroutine. The overhead
should be included in the gprof profile stats, unless I am
misinterpreting the gprof method. Same thoughts about memory access. About 10% of the computer run seems to be spent on
Fortran i/o, based on the output of the "top" utility (procs are "idle" or running "system calls"), but again, this should be included in the gprof stats, since the io is called from inside the profiled subroutines.

I would be grateful if someone could comment on what might
be causing the difference in clocking methods.

0 Kudos
1 Reply
Black Belt
gprof attempts to measure the CPU time spent in each reported subroutine. In the call graph profile, it attempts to show time spent in a function, plus those functions called by it which are instrumented for gprof. Thus, the function calls inserted by OpenMP, and Fortran run-time library functions,would show up only as separate entries, if you are lucky.
0 Kudos