topic Measure performance MPI + openMP in Analyzers

Measure performance MPI + openMP

i9 — Mon, 07 Jun 2021 19:28:00 GMT

There is a program written with MPI and openMP, and I want to measure the performance of MPI interfaces.

I tried vtune hotspots to profile it,

export OMP_NUM_THREAD=6 cat >vtune.conf <<EOF 0-34 ./app 35 amplxe-cl -collect hotspots -no-follow-child -trace-mpi -r result -- ./app EOF srun -N 6 -n 36 --multi-prog vtune.conf

In the profiling results, the hotspots functions are

opal_timer_base_get_usec_sys_timer

func@xxxx

_pthread

those results are not what I intended, I want to get the results of MPI interfaces, what should I tweak to get the right hotspots? Should I use a different tool?

Re:Measure performance MPI + openMP

AlekhyaV_Intel — Tue, 08 Jun 2021 12:22:15 GMT

Thank you for posting in Intel Forums. You could make the below corrections to your commands.

export OMP_NUM_THREADS=12

mpirun -n 16 –ppn 4 –l vtune -collect hotspots -k sampling-mode=hw -trace-mpi -result-dir <result directory path> -- ./app

If a MPI application is launched on multiple nodes, VTune Profiler creates a number of result directories per compute node in the current directory encapsulating the data for all the ranks running on the node in the same directory.

Please refer the below link to know the utilization of both the tools with MPI, Intel Advisor and Intel VTune profiler, to collect performance data at the node and core level

https://software.intel.com/content/www/us/en/develop/articles/using-intel-advisor-and-vtune-amplifier-with-mpi.html

A separate tool exists to record the details of the communication patterns and communication costs of an MPI application, the Intel® Trace Analyzer and Collector. The information provided by VTune™ Amplifier and Intel® Advisor is focused on the core and node performance, and complements the specific MPI communication details provided by the Intel® Trace Analyzer and Collector.

Please check this and let us know if this works.

Regards,

Alekhya

Re: Measure performance MPI + openMP

i9 — Tue, 08 Jun 2021 12:41:04 GMT

Thank you very much for your reply.

The version of VTune is `Intel(R) VTune(TM) Amplifier 2018 Update 4 (build 574913) Command Line Tool`, so I would use amplxe-cl instead.

May I show you more details for this experiment? There are 6 MPI procs launched on each node and on the last single node, 1 MPI proc is utilized to perform parallel io. I want to focus more on the performance of this parallel io MPI proc. Is measuring all nodes necessary?

Bests

Re:Measure performance MPI + openMP

AlekhyaV_Intel — Fri, 11 Jun 2021 19:53:05 GMT

Hi,

Profiling runs in Vtune would include profiling all the MPI processes inside a node, or a single MPI process on each node. You could use selective profiling to reduce the size of the results collected by VTune Profiler.

We have provided a command to profile a single mpi process after building the application. Among the 64 processes, 16 processes are allocated per node. Using option "-gtool" you could launch tools such as Intel® VTune profiler, Intel Advisor, and GNU Debugger (GDB) for the specified processes through the mpiexec.hydra and mpirun commands and profile the required process.

mpirun -n 64 -ppn 16 -gtool "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out "vtune -collect <analysis-type> -r <result-dir> :5" ./a.out "vtune -collect <analysis-type> -r <result-dir> :0-15" ./a.out -gtool "vtune -collect <analysis-type> -r <result-dir> :7" ./a.out

Please try this and let us know if this works.

Regards,

Alekhya

Re:Measure performance MPI + openMP

AlekhyaV_Intel — Mon, 21 Jun 2021 08:29:53 GMT

Hi,

Is your issue resolved? Could you give us an update?

Regards,

Alekhya

Re:Measure performance MPI + openMP

AlekhyaV_Intel — Mon, 28 Jun 2021 12:20:37 GMT

We assume that your issue is resolved. If you need any further assistance, please post a new question as this thread will no longer be monitored.

Regards,

Alekhya

Re: Re:Measure performance MPI + openMP

Dmitry_P_Intel1 — Mon, 28 Jun 2021 13:17:47 GMT

Hello,

If you need exact timing for MPI functions it makes sense to use tools based on MPI instrumentation like APS <VTune_install_dir/bin64/>aps or ITAC. VTune sampling approach usually does not work well with this.

Thanks & Regards, Dmitry