- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am running a serial program which is taking 2min 50sec (with vtune options, its taking some more seconds). The same program with threaded and vectorized, normal execution is taking 2min but adding vtune options to profile its taking almost half-an-hour to execute program and got stuck at Finalizing data throwing message saying
amplxe: Warning: The result contains a lot of raw data. Finalization may take a long time to complete.
amplxe: Executing actions 12 % Loading '11106-11118.0.trace' file
even after 90minutes past this message, there was no decrease in disk space observed. Why is it happening? what would be the solution for it?
Following is the command used to execute the program
time mpirun -host cluster -n 2 -env I_MPI_DEBUG=5 amplxe-cl -c hotspots -trace-mpi -data-limit=51200 -r hs_icc ./ex4 : -machinefile machinfile -n 6 -env I_MPI_PIN_DOMAIN socket -env I_MPI_DEBUG=5 ./ex4
And will there be any difference in running time with manual pinning of processes to the nodes and using machinefile?
Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Rakesh,
Could you please provide the following additional information that can help to triage the issue:
What VTune version do you use?
How many cores do you have per node and what is the node processor?
It is worth to try advanced-hotspots to see if it can quickly help.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dmitry-prohorov (Intel) wrote:
Hello Rakesh,
Could you please provide the following additional information that can help to triage the issue:
What VTune version do you use?
How many cores do you have per node and what is the node processor?
It is worth to try advanced-hotspots to see if it can quickly help.
Thanks & Regards, Dmitry
Hi dmitry-prohorov (Intel),
Thanks for the reply, here is what i am using
VTune version:
Intel(R) VTune(TM) Amplifier XE 2017 (build 478468)
All Nodes have following configuration:
Processor name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Packages(sockets) : 2
Cores : 16
Processors(CPUs) : 16
Cores per package : 8
Threads per core : 1
i was able to generate the report but there was difference in thread count and elapsed time...!! I have set openmp threads=7 but report showed 29. Without vtune profiling options Elapsed time=2min but report says 16min. For serial program, I am not using any threads but report says thread count=15. Why?
Thank you
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page