options for debugging and performace evaluation

utab · ‎06-30-2014

Dear all,

What are the most common 'MUST' options for debugging and a subsequent performance evaluation on linux?

Best regards,

Umut

TimP · ‎06-30-2014

The debug or -g setting is often satisfactory for that purpose. For full performance you would want an architecture switch such as -xhost and at least -O2 and possibly -unroll4. You should select -standard-semantics or some of its sub-options.

jimdempseyatthecove · ‎07-01-2014

In gathering timing data, you may want to discard the first pass (or record it separately). This pass may incur additional overhead that may be part of or may interfere with you obtaining good test data. Consider placing your timed intervals into an array of times. This will give you a better picture of what is going on. If you are venturesome, you might consider using thread local storage and managing the timing statistics per thread. Sometimes this is an eye-opener too.

Jim Dempsey

utab · ‎07-01-2014

jimdempseyatthecove wrote:

In gathering timing data, you may want to discard the first pass (or record it separately). This pass may incur additional overhead that may be part of or may interfere with you obtaining good test data. Consider placing your timed intervals into an array of times. This will give you a better picture of what is going on. If you are venturesome, you might consider using thread local storage and managing the timing statistics per thread. Sometimes this is an eye-opener too.

Jim Dempsey

Dear Jim,

What I am after at the moment is to time some of my algorithms and to see if I am getting close values between these runs, then I will either average these timing data or get the median of it.

I am performing, at least at the moment, everything sequentially, my linear solver which is the most important on the costs already is compiled for sequential mode(MUMPS).

Thanks for your comments.

Umut

JVanB · ‎07-01-2014

The most common situation is that you get close values for most of your runs, but the first is much slower because you have to fill instruction cache and maybe data cache and the BTB. Some other runs can be slower because another process knocked them out of caches or the OS moved the test to another core, using different caches. So you typically want to look out for, and perhaps reject, outliers that are observed for these reasons.