Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Profiling with gprof or saturn on mac os x

bob2
Beginner
1,961 Views

Im trying to find out about the performance of a simple fortran 90 program. Previously I used the compiler and linker option p together with gprof, but this seems no longer to work on my intel mac with Mac OS X 10.5.5, XCode 3.0 and the Intel Fortran Compiler10.1 20080312. The gprof output tells me thatevery method used 0.0 seconds eventhough the entire program runs for hours.I was told to use instead the-finstrument-functions option for the Compiler and to link with Apples Saturn library (-lSaturn for the linker). There, however, I keep getting errors like:

Undefined symbols:

"__cyg_profile_func_exit", referenced from:

_MAIN__ in convert.o

"__cyg_profile_func_enter", referenced from:

_MAIN__ in convert.o

ld: symbol(s) not found

Is there anybody who experienced similar issues?

0 Kudos
8 Replies
TimP
Honored Contributor III
1,961 Views
I'll agree that the documentation on several alternatives for gprof has been confusing. It ought to work when the -pg option is used throughout (at least from the main program down through the functions you want to profile), including the link step, same as gcc. There are alternate spellings for -pg, but as far as I know, -p is a valid alternate only for Intel compilers on Itanium.
0 Kudos
bob2
Beginner
1,961 Views

Thanks for this advice. I triedgprof once againwith -pg (build and link), but the problem remains the same as with the -p option: The function call statistics are correctly reported but the time remains empty. Here is an excerpt of the output of gpgrof:

...
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.0 0.00 0.00 125803621 0.00 0.00 _pdf1d_lib_mp_ran1_ [105]
0.0 0.00 0.00 98312192 0.00 0.00 _pdf1d_lib_mp_gasdev_ [106]
0.0 0.00 0.00 98304000 0.00 0.00 _pdf1d_lib_mp_sde_integrator_ [107]
0.0 0.00 0.00 594051 0.00 0.00 _pdf1d_lib_mp_ranw_ [108]
0.0 0.00 0.00 19000 0.00 0.00 _stopwatch_ [109]
...

The time columns remain zero but should report values larger than seconds.

0 Kudos
Kevin_D_Intel
Employee
1,961 Views

This is not an area I'm familiar with, but after some investigation here is what little I can offer.

I confirmed information I found that gprof has always produced zero timings on what others refer to as an "Intel Mac". Apparently this is a know issue that is not related to the Intel compilers and I reproduced this using both Intel compilers and GNU C++ (gcc 4.0.1) on Leopard (10.5.2) w/Xcode 3.0.

On Leopard (10.5.5) with Xcode 3.1.1, I cannot produce a gmon.out file using Intel or GNU compilers. It is unclear what that issue is, but that may be of little importance given the above information.

The unresolved externals received when linking the Saturn library and using the Intel Fortran compiler involve a difference between the name decorations for the instrumented routines. On Mac OS X, a third leading underscore is expected. This mismatch was reported to our Compiler development team. While our name decoration is compatible with GCC on Linux, GCC on Mac OS is clearly different although the man page does not indicate this. Regardless, GCC on Mac OS is clearly compatible with the Saturn library decorations. Assuming the GCC Mac OS decorations are correct, we will likely decorate accordingly per the specific requirement on each OS.

I was somewhat successful using the bridge routines below on Mac OS X 10.5.2 with Xcode 3.0 and the IA-32 Intel Fortran 10.1.015 and 11.0.034 Beta compilers.

First, I could not link the Saturn library using either the Intel or GNU Intel 64 compilers due to the following:

ld64 warning: in /usr/lib/libSaturn.dylib, missing required architecture x86_64 in file

I did not dig any further into this.

Using the bridge routines allowed me to compile/link using the IA-32 Intel Fortran 10.1 and 11.0 compilers. I could launch the execution under Saturn and raise the resulting output report. The major issue that exists with the report when using the Intel Fortran compiler is that Saturn is unable to display any Fortran routine names despite all my efforts to ensure the presence of symbols. It is unclear what the issue is here and more investigation is required into using Saturn with the Intel compilers that I will take up with compiler development.

The IA-32 GNU C++ compiler operates fine when linking with the Saturn library, and the report displays routine names.

I do not know if someone with additional experience using the Mac OS X performance tools can make further progress using the bridge routines with the Intel Fortran compiler or not.

Here is how I used them:

gcc m32 c bridges.c

ifort finstrument-functions g save-temps O2 sample.f90 bridges.o lSaturn

[cpp]bridges.c:

extern void __cyg_profile_func_enter(void *f, void *c);
extern void __cyg_profile_func_exit(void *f, void *c);

void _cyg_profile_func_enter (void *f, void *c) {
    __cyg_profile_func_enter (&f, &c);
}

void _cyg_profile_func_exit  (void *f, void *c) {
    __cyg_profile_func_exit (&f, &c);
}[/cpp]
0 Kudos
bob2
Beginner
1,961 Views

Thanks Kevin for this input!

I have been looking in the meantime for an alternative and I could actually find one that is suitable for my purpose.

One can use Shark

/Developer/Applications/Performance Tools/Shark.app

Shark works differently than gprof or Apple's Saturn, however. It does not need a specifically compiled executable (where beginnings and endings of routines are tagged) but instead determines in a cyclic manner what method of what process is currently running. If the time interval of this cyclic measurement procedure is sufficiently small, the time consumptions of different parts of an executable can be determined. Obviously the subsequent analysis of the data works best with an executable that includes some debugging information. The measurements recorded by shark can be analyzed with the graphical user interface of Shark.app.

The Mac OSX application Shark.app has a command line version, shark, that has even a manpage. The remote mode (shark -r) is best suited to determine the scaling of some program with respect to parameter changes. In that mode, shark waits for signals that can be sent by chudRemoteCtrl. So, while shark is running and waiting, we invoke in a simple script "chudRemoteCtrl -s" to start a shark profiling session, then run our program to inspect, and finally "chudRemoteCtrl -e" to finish the current session. At the conclusion of each session, shark writes out a report and continues to run (waiting) until it is terminated by ctrl-c. With the -m option we can provide a configuration file that can be composed in Shark.app. Similarly, the generated reports can be analyzed by Shark.app. Additionally, shark provides as well options to generate reports in text form.

0 Kudos
Kirill_Mavrodiev__In
1,961 Views
Hi,

I believe that replacement of gprof by Saturn or Shark is good.
I have checked manuals for Saturn and Shark but I didn't understand how to use any of each profilers from command line like as the gprof with out any additional setups or source changes. Can you tell me any simple way (or BKM) to get applications profile on MAC OS? E.g. to write some batch file and after execution of it I'll get text file with applications profile.

Thanks.

0 Kudos
koppenhoefer
Beginner
1,961 Views
Hi,

Has anyone resolved this issue?
We too are getting 0.0 execution times noted despite having a long-running program run.
As for @bob, we are getting values for callcount but nothing in the %time, cumulative or self seconds.

What do I need to do to get grof working correctly with icc?
I don't want to give up on gprof. :-(
any ideas?


shawn

p.s. Some suggested ensuring that we're linking to /usr/lib/gcrt.o ??
0 Kudos
Ron_Green
Moderator
1,961 Views
shawn,

You said you wanted it to work with 'icc', I assume you mean ifort.

A while ago I wrote this article on the topic:

http://software.intel.com/en-us/articles/intel-visual-fortran-pro-for-linux-notes-on-gprof-use/

Keys:
- use a recent ifort compiler
- disable inlining
- remove the default "_" name decoration used by Fortran. Most C-centric tools filter out function names with trailing underscores. You can actually see the source lines in gprof where they throw out any names with trailing "_" characters. I can't speak for Saturn, but suspect it's similar.

This is not an Intel Fortran issue.

ron
0 Kudos
nooj
Beginner
1,961 Views
>Has anyone resolved this issue?
> We too are getting 0.0 execution times noted
> despite having a long-running program run.
> As for @bob, we are getting values for callcount
> but nothing in the %time, cumulative or self seconds.
I have a possible answer (three years later).
This can happen if there is some system-level process, such as I/O, which is blocking the program and causing it to sleep while it waits. Thus, none of your procedures take significant time to execute (less than 0.0%) because it's at the system level where the waiting occurs.
Try turning off I/O as much as possible (in sensible ways for your program). See the thread below for some suggestions.
Details:
I resolved this issue when it came up for me. See
"I/O substantially slower today than yesterday"
In that situation, I was piping output of my program to a VERY SLOW script:
./my.exe | my_stdout_processing_script
I didn't know the script was slow. But it was reading stdout of the program very slowly. The pipe operator was filling its buffer trying to pass my program's stdout to the script, and was forcing the program (my.exe) to wait while the buffer was emptied.
Reading the status of the computer's processes showed my.exe was continually sleeping, reporting "pipe_wait". Here is a sample of my output from gprof:
Flat profile:
granularity: each sample hit covers 4 byte(s) no time accumulated
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.0 0.00 0.00 3361419 0.00 0.00 _fillsparsemat_3d_p2_ [79]
0.0 0.00 0.00 3361419 0.00 0.00 _fillsparsemat_3d_p3_ [80]
0.0 0.00 0.00 3361419 0.00 0.00 _fillsparsemat_3d_p_ [81]
0.0 0.00 0.00 871654 0.00 0.00 _eval_shape_3d_new_ [82]
0.0 0.00 0.00 256870 0.00 0.00 _intelmass_postproc_3d_ [83]
0.0 0.00 0.00 188811 0.00 0.00 _fillsparsemat_3d_u_ [84]
0.0 0.00 0.00 139968 0.00 0.00 _fillsparsematb_3d_u_ [85]
0.0 0.00 0.00 64102 0.00 0.00 _compute_gtens_pl2_ [86]
0.0 0.00 0.00 28491 0.00 0.00 _e3tensors_nooj_ [87]
0.0 0.00 0.00 22293 0.00 0.00 _intelmass_3d_ [88]
Notice that no significant amount of cpu time is spent in any procedure of my code. It was ALL spent in the write() intrinsic.
Notice that while in 2008 there may have been incompatibilities with ifort/gprof/saturn/shark/whatever, the situation I describe can occur regardless.
- Nooj
0 Kudos
Reply