I am compiling the same OpenMP program using both gcc(4.5.3) and icc(13.1.1 20130313), with debug options: (-g, O3) and with corresponding OMP flags (-fopenmp and -openmp) . My code includes some omp_locks and I want to analyze it with VTune Amplifier Wait and Locks analysis.
My idea is that since locks are present, thread transitions are expected (as yellow lines). This happens when I analyze the icc binary. However the analysis on the gcc one simply does not show any transition. The reason I use gcc is because the speedup achieved so far is higher than of the icc version (after solving all data races with Intel Inspector).
My question is:
- is there any special debug flag that I forgot?
Thanks for your help!
I don't know if this is a feature request, but I can reproduce this on my side.
I used same VTune(TM) Amplifier XE version for Linux
# icc -g -openmp -O3 pi.cpp -o pi.i++
# g++ -g -fopenmp -O3 pi.cpp -o pi.g++
Support openmp frame region in button-up report
# export KMP_FORKJOIN_FRAMES=1
# amplxe-cl -collect locksandwaits -- ./pi.i++
# amplxe-cl -collect locksandwaits -- ./pi.g++
There is no "transition" metric in "Ruler area", in bottom-up report - I mean to use pi.g++. I have reported this problem to dev team.
Thanks to Tim. It works after rebuilding code with "g++ -g -fopenmp -O3 -liomp5 pi.cpp -o pi.g++" and profiling again.Probably libiomp5 is necessary to interpret result of VTune.