- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a MPMD job (mpirun -np 200 ./app1: -np 120 ./app2 : -np 80 ./app3) and i plan to compare the MPI behaviour of the job over 10G ethernet network and infiniband network. Here are few queries -
1)
APS is one tool which comes to my mind which is capable of generating summary of MPI without needing recompilation of executables (with -g). is this statement correct?
2)
Also, i plan to use get the data via aps tool as -
mpirun -np 200 aps ./app1: -np 120 aps ./app2 : -np 80 aps ./app3
Is this correct way to use aps tool for MPMD runs?
3) The application takes ~4 hours (without aps), will the runtime increase due to overhead by aps ?
also, is there a way to customize the data collection time and file size of collection ? example -
profile for entire duration of run/ profile only 1st hour of run, or profile till the profiling data is 100G (due to disk space limitation).
4) unrelated to this post - I have uploaded few files for this query https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/intel-mpi-error-line-1334-cma-read-nbytes-size/td-p/1329220
Is it possible to activate this thread or i need to create new post on same topic?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
>>"APS is one tool which comes to my mind which is capable of generating summary of MPI without needing recompilation of executables (with -g). is this statement correct?"
Yes, we need not recompile the executables with "-g" option in order to use aps tool.
For more information refer to the below link:
>>"Is this correct way to use aps tool for MPMD runs?"
Yes, it is the correct way to use the APS tool for MPMD job runs.
>>"will the runtime increase due to overhead by aps ?"
Yes, we can expect an increase in runtime as the APS tool takes time to collect the data.
>>" is there a way to customize the data collection time and file size of the collection?"
We can control the amount of collected data which enables you to reduce profiling overhead and focus on relevant application sections.
Please refer to the below link for more information:
>>"Is it possible to activate this thread or i need to create new post on same topic?"
As the thread was closed, it will no longer be monitored by Intel. For further investigation, please post a new question referring to the URL of the old query.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply, Here are few more queries -
a) do we need sudo permission/privileges to profile a job using intel APS/Vtune ?
b) Do we need to load the sep driver on all the compute nodes (insmod-sep -r) for multinode profiling to work correctly?
c) We have infiniband interconnect on the cluster, is there a way to see the aggregated MPI throughput (/data transfer rate) and per-node throughput using the aps tool? .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
To answer your queries, we need more information from your side. So, could you please provide the below details?
- Operating system being used.
- The version of Intel oneAPI HPC Toolkit & Intel oneAPI Base Toolkit.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. RHEL 7.9
2. inteloneapi/2021.3.0
I was able to explore the answers for my previous queries.
I have a both ib and ethernet (1 gig) interface available on our cluster and while trying the MPMD profiling , here are the overhead i noticed -
1 node : with profiling ~5 hours (50% of elapsed time in MPI) , without profiling - ~5 hours
2 node : with profiling ~10 hours (85% of elapsed time in MPI ), without profiling ~3 hours.
for 2 node , Is the slowdown of this magnitude expected ? If this is expected with aps + ethernet then could you please share some recommendations with which the aps overheads can be reduced.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reporting us.
We have reported this issue to the concerned development team. They are looking into your issue.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>" Is the slowdown of this magnitude expected ? If this is expected with aps + ethernet then could you please share some recommendations with which the aps overheads can be reduced"
To debug and investigate more on your issue, could you please provide us with the sample reproducer codes for app1, app2 & app3 applications that you are using for launching the MPMD job in the below command?
mpirun -np 200 aps ./app1: -np 120 aps ./app2 : -np 80 aps ./app3
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. For further investigation of your issue, could you please provide us with the sample reproducer codes for app1, app2 & app3 applications that you are using for launching the MPMD job?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks & Regards,
Santosh
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page