Solved: performance tuning

Izaak_Beekman · ‎08-04-2011

Hi, I am wondering the extent to which performance tuning can be accomplished without a vTune license. Right now I am using gprof to find hotspots and compare execution times between different procedures and implementations. It would be nice, however to have additional information to help guide my optimization efforts. For example it would be nice to have information about cache misses, especially on a line by line basis.

If anyone has experience with this type of performance tuning and can help point me in the right direction using either F/OSS or the tools included in the intel compiler suite (including idb, ifort, etc.) but not vTune, I would really appreciate it.

Thanks,
Z

EDIT: One more thing: I am on x86_64 RHEL 5 and would likely need to build any recomended F/OSS software from source since i do not have root priviledges.

TimP · ‎08-04-2011

oprofile covers profiling by event counters, but I haven't seen much in the way of detailed tutorials. It should be included in a full installation of RHEL (I don't know about versions).
No matter what your tools, your first view of cache misses will be loop by loop, not line by line. The instruction tagging features of Intel profilers aren't automatic nor quickly learned.

View solution in original post

TimP · ‎08-04-2011

oprofile covers profiling by event counters, but I haven't seen much in the way of detailed tutorials. It should be included in a full installation of RHEL (I don't know about versions).
No matter what your tools, your first view of cache misses will be loop by loop, not line by line. The instruction tagging features of Intel profilers aren't automatic nor quickly learned.

jimdempseyatthecove · ‎08-04-2011

I agree with TimP (The instruction tagging features of Intel profilers aren't automatic nor quickly learned). What you find in the documentation is information written from the perspective of the Electrical Engineers who design the CPUs and not from the perspective of the Software Engineer using the event based profiler. It would be nice if the documentation included a functional index and abstract that is not alphabetical nor by function group. Instead, this should be ordered by the typical programmer's optimization attack sequence

do this first
do this second
...
do this ...

Then:

When confronted with xxx
do this first
do this second
...
do this ...

When confronted with yyy
...

Also, the event descriptions are written from, and in the abbrieveated vernacular of,the EE's perspective and not from the SE's perspective who may be thinking more in abstract terms.

After the user makes several iterations through this process, then they might come to understand the EE's perspective to master the finer points of event based profiling.

My 2 cents...

Jim Dempsey

Izaak_Beekman · ‎08-04-2011

Thanks so much Tim. Loop by loop is an improvement over procedure by procedure and having cache hit/miss info will be very informative. Thanks for your response. I will go spend some time learning about oprofile....

Jeffrey_A_Intel · ‎08-05-2011

You might want to look at IgProf: http://igprof.sourceforge.net/. See also this paper.IgProf is used by, among others, the CMS LHC collaboration at CERN.