We have a little cluster with the Intel CLuster Toolkit. We develop in our research labor a Fluid-Structure Interaction program. This one is indeed 3 programs:
- a fluid solver written in Fortran;
- a solid solver written in C++;
- a coupling program written in C++.
The 3 programs communicate with MPI.
Our goal is to optimize these programs, particularly the FORTRAN program. So I have searched and found Vtune Amplifier. I have installed it for evaluation. And I have questions:
- With which compilation options should the program compiled ? "-g -02" ? Are there compilation options, which are known to do some troubles in Vtune ?
- In order to start our program we are using mpirun:
mpirun -np 1 "coupling program"; -np 1 "solid solver"; -np 5 "fluid solver"
What is the best way to start Vtune with this type of mpirun line ?
- at the moment I have tested Vtune just on our "fluid solver". I have created a script with
mpirun -np 5 "fluid solver"
And started Vtune on it. I got the results. But it seems that there is a problem for one of the subroutine: the displayed time is written for a comment line ?!? or for a line with nothing (see figure in attachment)...I can't read assembly code...So I don't know if this is a bug just for the Fortran source code or for the assembly code too. What can I do ?
Thx a lot,
Usually I used below to collect performance data:
amplxe-cl -collect hotspots -r r0002hs --mpiexec.hydra -bootstrap fork-np 4 ./pi.gcc
Please see my articlefor more detail steps.
It seemed that you ran system wide data collection (manually run application first: with mpirun). It should be OK!
Thx for your answer. In order to solve my problem with the comments as described in the figure, I have to generate my source file with all the includes:
gcc -E -P -DINTEL -DU77 -DGNU -Isrc/include src/test.F > test.F_withinclude
Then I copy the file test.F_withinclude in the place of my original test.F. Then Vtune starts and shows very interesting results, which make sense :D
@ Stephen T
What kind of analysis type did you use? Hotspots analysis will collect stack info automatically if you had debug info built.
If you use advanced-hotspots, please add option "-knob collection-detail=stack-sampling" which will collect call stack information,
BTW, VTune(TM) Amplifier XE 2015 Update 1 is ready for now.
@ Stephen T
I saw you had another thread 534715, which attached hotspots result r021hs.
In that report, there was more IDLE time (1.353s) than serial time (0.651s), and top 1-N hot functions were dropped in ntdll.dll, MSVCR100.dll, and kernel32.dll. Hot function "std::getline<...>" in your module Test.exe only took 6.002ms with [No call stack information]. Actually, I doubt that your function was called by ntdll.dll which has no symbol info, this caused no stack info to be displayed.
You can try to build test case which spends more CPU time, and make a caller in your code which call your hot function(s) - thus, caller will not be in ntdll.dll, to verify this issue. Hope it helps.
>>> But it seems that there is a problem for one of the subroutine: the displayed time is written for a comment line ?!? or for a line with nothing (see figure in attachment)...I can't read assembly code.>>>
It seems that <Block 19> is a loop which is copying values from R15 register to different GP registers. I cannot see call instruction and I cannot see do loop construct in the source code on the left pane.Put it simply left pane does not correspond to right pane.