Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4998 Discussions

VTune Finalizing data and Executing actions 12% and stuck with threaded application

Rakesh_M_
Beginner
557 Views

Hi,

I am running a serial program which is taking 2min 50sec (with vtune options, its taking some more seconds). The same program with threaded and vectorized, normal execution is taking 2min but adding vtune options to profile its taking almost half-an-hour to execute program and got stuck at Finalizing data throwing message saying

amplxe: Warning: The result contains a lot of raw data. Finalization may take a long time to complete.
amplxe: Executing actions 12 % Loading '11106-11118.0.trace' file

even after 90minutes past this message, there was no decrease in disk space observed. Why is it happening? what would be the solution for it?

Following is the command used to execute the program

time mpirun -host cluster -n 2 -env I_MPI_DEBUG=5 amplxe-cl -c hotspots -trace-mpi -data-limit=51200 -r hs_icc ./ex4 : -machinefile machinfile -n 6 -env I_MPI_PIN_DOMAIN socket -env I_MPI_DEBUG=5 ./ex4

And will there be any difference in running time with manual pinning of processes to the nodes and using machinefile?

Thank you

0 Kudos
2 Replies
Dmitry_P_Intel1
Employee
557 Views

Hello Rakesh,

Could you please provide the following additional information that can help to triage the issue:

What VTune version do you use?

How many cores do you have per node and what is the node processor?

It is worth to try advanced-hotspots to see if it can quickly help.

Thanks & Regards, Dmitry

 

0 Kudos
Rakesh_M_
Beginner
557 Views

dmitry-prohorov (Intel) wrote:

Hello Rakesh,

Could you please provide the following additional information that can help to triage the issue:

What VTune version do you use?

How many cores do you have per node and what is the node processor?

It is worth to try advanced-hotspots to see if it can quickly help.

Thanks & Regards, Dmitry

 

Hi dmitry-prohorov (Intel),

Thanks for the reply, here is what i am using

VTune version:
Intel(R) VTune(TM) Amplifier XE 2017 (build 478468)

All Nodes have following configuration:
Processor name    : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Packages(sockets) : 2
Cores             : 16
Processors(CPUs)  : 16
Cores per package : 8
Threads per core  : 1

i was able to generate the report but there was difference in thread count and elapsed time...!! I have set openmp threads=7 but report showed 29. Without vtune profiling options Elapsed time=2min but report says 16min. For serial program, I am not using any threads but report says thread count=15. Why?

Thank you

0 Kudos
Reply