Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28457 Discussions

options for debugging and performace evaluation

utab
Beginner
344 Views

Dear all,

What are the most common 'MUST' options for debugging and a subsequent performance evaluation on linux?

Best regards,

Umut

0 Kudos
4 Replies
TimP
Honored Contributor III
344 Views
The debug or -g setting is often satisfactory for that purpose. For full performance you would want an architecture switch such as -xhost and at least -O2 and possibly -unroll4. You should select -standard-semantics or some of its sub-options.
0 Kudos
jimdempseyatthecove
Honored Contributor III
344 Views

In gathering timing data, you may want to discard the first pass (or record it separately). This pass may incur additional overhead that may be part of or may interfere with you obtaining good test data. Consider placing your timed intervals into an array of times. This will give you a better picture of what is going on. If you are venturesome, you might consider using thread local storage and managing the timing statistics per thread. Sometimes this is an eye-opener too.

Jim Dempsey

0 Kudos
utab
Beginner
344 Views

jimdempseyatthecove wrote:

In gathering timing data, you may want to discard the first pass (or record it separately). This pass may incur additional overhead that may be part of or may interfere with you obtaining good test data. Consider placing your timed intervals into an array of times. This will give you a better picture of what is going on. If you are venturesome, you might consider using thread local storage and managing the timing statistics per thread. Sometimes this is an eye-opener too.

Jim Dempsey

Dear Jim,

What I am after at the moment is to time some of my algorithms and to see if I am getting close values between these runs, then I will either average these timing data or get the median of it.

I am performing, at least at the moment, everything sequentially, my linear solver which is the most important on the costs already is compiled for sequential mode(MUMPS).

Thanks for your comments.

Umut

 

0 Kudos
JVanB
Valued Contributor II
344 Views

The most common situation is that you get close values for most of your runs, but the first is much slower because you have to fill instruction cache and maybe data cache and the BTB.  Some other runs can be slower because another process knocked them out of caches or the OS moved the test to another core, using different caches.  So you typically want to look out for, and perhaps reject, outliers that are observed for these reasons.

 

0 Kudos
Reply