- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am currently looking into the problem of memory consumption for all-to-all based MPI software.
I far as I understand, for releases before Intel MPI 2017, we could use the DAPL_UD mode through the following variables: I_MPI_DAPL_UD and I_MPI_DAPL_UD_PROVIDER.
Since the support of DAPL has been removed in the 2019 version, what do we have to use for InfiniBand interconnect?
My additional question is also, what are the variables that reproduce the same behavior for OmniPath interconnect?
Thank you for your help.
Best,
Thomas
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
Could you please advise which Intel MPI release you are running?
Is there any chance that you might try Intel MPI 2019 update 6 which has improved support on MLNX IB. If you could do so, please try the following settings.
export I_MPI_FABRIC=shm:ofi
export FI_PROVIDER=mlx
export UCX_TLS=ud,sm,self
Could you please also advise what exactly is the problem of memory consumption? For example, symptom and impact. What's your system configuration?
Thanks,
Zhiqi
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
Thanks for reaching out to us.
We are working on this issue and will get back to you.
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
As you are aware, the DAPL, TMI, and OFA fabrics are deprecated from MPI 2019.
You can use ofi fabric(OpenFabrics Interfaces* (OFI)-capable network fabrics) these use a library called libfabric which provides a fixed application-facing API while talking to one of several "OFI providers" which communicate with the interconnect hardware.
To Select the particular fabrics to be used.
Syntax:
I_MPI_FABRICS=<ofi | shm:ofi | shm>
Intel® MPI Library supports psm2, sockets, verbs, and RxM OFI* providers. Each OFI provider is built as a separate dynamic library to ensure that a single libfabric* library can be run on top of different network adapters.
To define the name of the OFI provider to load.
Syntax:
I_MPI_OFI_PROVIDER=<name>
For InfiniBand interconnect use verbs, and for OmniPath interconnect use psm2 provider.
For more information please refer the following links:
https://software.intel.com/en-us/mpi-developer-reference-windows-communication-fabrics-control
https://software.intel.com/en-us/mpi-developer-guide-linux-ofi-providers-support
Hope my answer solves your query.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Dwadasi,
Thanks for your feedback.
Is there a way to reduce memory consumption with OFI?
Similar to what was possible with the DAPL, by switching to UD behavior?
Thks a lot,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
Could you please advise which Intel MPI release you are running?
Is there any chance that you might try Intel MPI 2019 update 6 which has improved support on MLNX IB. If you could do so, please try the following settings.
export I_MPI_FABRIC=shm:ofi
export FI_PROVIDER=mlx
export UCX_TLS=ud,sm,self
Could you please also advise what exactly is the problem of memory consumption? For example, symptom and impact. What's your system configuration?
Thanks,
Zhiqi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhiqi,
Thank you for your detailed answer. I haven't tried the 2019.6 version yet (I couldn't find it actually).
Could you also give me the same kind of commands but for the OmniPath? (I am running on both IB and OP.)
Especially, what is the equivalent to UCX_TLS? How can I ask a less memory consuming transport protocol?
For the systems I am running on, I tried both Juwels (https://www.top500.org/system/179424) and MareNostrum (https://www.bsc.es/discover-bsc/the-centre/marenostrum).
On Juwels, using ParaStation MPI, we could reach 73k procs but it failed using Intel MPI 2019.4 (at 9k if I remember correctly).
We have the same kind of issues on MareNostrum, we are not able to go higher than 9k.
Despite the use of sub-communicators etc, the error is always the same on both clusters: the job freezes in an all-to-all MPI communication, without any additional messages.
Thank you for your help.
Best,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
So the issue is that you can't get more than 9K ranks with Intel MPI.
Could you please try to use -genv I_MPI_DEBUG=5? It would print more debug info.
https://software.intel.com/en-us/mpi-developer-guide-windows-displaying-mpi-debug-information ;
Thanks,
Zhiqi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhiqi,
Sure, I will give it a try on MareNostrum (OmniPath).
Beforehand, could you give me the configuration equivalent to UD for the Omnipath?
So I can use them with the debug mode.
Thanks,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ZhiQi,
should I assume that OFI now can use UCX as a transport?
What is the recommended provider for Mellanox EDR / HDR? Any suggestions on the MOFED versions ?
Intel MPI 2019 update 6 is not out yet. Any idea as to the release date?
thanks
--Michael
PS (Did you use to work for the Lustre team? :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
There is no "UD" version on Omni-Path. It would be the best to use "FI_PROVIDER=psm2" when you run Intel MPI on the Omni-Path cluster. PSM2 would be the lowest memory-footprint.
I discussed with the Omni-Path Engineering team. They recommended that you open a fabric support case by emailing to fabricsupport@intel.com. Let us make sure that the Omni-Path configurations are optimal.
Best Regards,
Zhiqi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhiqi,
Thanks a lot for your answers and explanation.
I will give it a try.
Best,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
When running Intel MPI with Mellanox EDR/HDR, please use Intel MPI 2019 update 5 or later.
Please use "export FI_PROVIDER=mlx".
OFI/mlx requires UCX 1.5+.
I just realized that the Intel MPI 2019 release note https://software.intel.com/en-us/articles/intel-mpi-library-release-notes has not been updated to show the 2019 update 6 version. I have reported this issue to the release note owner.
In the mean while, you could download Intel MPI alone, https://software.intel.com/en-us/mpi-library/choose-download, you can find the Intel MPI 2019 update 6. I have validated it myself.
Yes, I was part of the Lustre team. :)
Best Regards,
Zhiqi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
The Parallel Studio 2020 is released today. https://software.intel.com/en-us/parallel-studio-xe
It consists of the Intel MPI 2019 update 6 https://software.intel.com/sites/default/files/managed/b9/6e/IPSXE_2020_Release_Notes_EN.pdf (page 4).
Best Regards,
Zhiqi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zhiqi T. (Intel) wrote:Hi Michael,
When running Intel MPI with Mellanox EDR/HDR, please use Intel MPI 2019 update 5 or later.
Please use "export FI_PROVIDER=mlx".
OFI/mlx requires UCX 1.5+.
I just realized that the Intel MPI 2019 release note https://software.intel.com/en-us/articles/intel-mpi-library-release-notes has not been updated to show the 2019 update 6 version. I have reported this issue to the release note owner.
In the mean while, you could download Intel MPI alone, https://software.intel.com/en-us/mpi-library/choose-download, you can find the Intel MPI 2019 update 6. I have validated it myself.
Yes, I was part of the Lustre team. :)
Best Regards,
Zhiqi
Hi Zhiqi,
So IntelMPI may now leverage UCX as one of the providers for the OFI framework? This is great. UCX is a quite capable transport and optimized on the Mellanox h/w stack. Can we also use HCOLL from Mellanox's HPC_X? I understand that IntelMPI has its own optimized collectives. HCOLL and UCX can leverage all the h/w accelerators that came builtin the Mellanox h/w.
So I assume we need to install UCX 1.5+ run-time libs somewhere and point OFI to them? Looking at the MPI docs I am not able to find complete instructions as to how to integrate IntelMPI 2019/2020 to use efficiently Mellanox gear.
We cannot launch IntelMPI 2019 update 5 on hosts with the mlx FI_PROVIDER on Mellanox network. Provider verbs runs but is is quiet inefficient:
$ FI_PROVIDER=verbs I_MPI_DEBUG=1000 $(which mpiexec.hydra) -hosts sntc0008,sntc0009 -np 2 -ppn 1 $I_MPI_ROOT/intel64/bin/IMB-MPI1 [0] MPI startup(): libfabric version: 1.7.2a-impi [0] MPI startup(): libfabric provider: verbs;ofi_rxm [0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1 [0] MPI startup(): addrname_len: 16, addrname_firstlen: 16 [1] MPI startup(): selected platform: hsw [0] MPI startup(): selected platform: hsw [0] MPI startup(): Load tuning file: /vend/intel/parallel_studio_xe_2019_update5/compilers_and_libraries_2019.5.281/linux/mpi/intel64/etc/tuning_skx_shm-ofi.dat [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 4245 sntc0008 0 [0] MPI startup(): 1 744 sntc0009 0 [0] MPI startup(): I_MPI_ROOT=/vend/intel/parallel_studio_xe_2019_update5/compilers_and_libraries_2019.5.281/linux/mpi [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=allcores:map=scatter [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_FABRICS=shm:ofi [0] MPI startup(): I_MPI_DEBUG=1000 #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2019 Update 4, MPI-1 part #------------------------------------------------------------ # Date : Wed Dec 18 18:05:03 2019 ... # Barrier #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 2.70 0.00 1 1000 2.67 0.38 2 1000 2.41 0.83 4 1000 2.25 1.78 8 1000 2.24 3.57 16 1000 2.58 6.20 32 1000 2.37 13.50 64 1000 2.33 27.52 128 1000 2.58 49.53 256 1000 3.05 83.81 512 1000 4.19 122.33 1024 1000 9.09 112.66 2048 1000 14.20 144.22 4096 1000 29.37 139.44 8192 1000 48.97 167.28 16384 1000 93.37 175.48 32768 1000 142.47 230.01 65536 640 264.65 247.63 131072 320 483.01 271.37 262144 160 2970.47 88.25 524288 80 3433.73 152.69 1048576 40 6836.81 153.37 2097152 20 9197.29 228.02 4194304 10 10543.25 397.82 #--------------------------------------------------- $ FI_PROVIDER=mlx I_MPI_DEBUG=1000 $(which mpiexec.hydra) -hosts sntc0008,sntc0009 -np 2 -ppn 1 $I_MPI_ROOT/intel64/bin/IMB-MPI1 [0] MPI startup(): libfabric version: 1.7.2a-impi Abort(1091471) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(703).......: MPID_Init(923)..............: MPIDI_OFI_mpi_init_hook(846): OFI addrinfo() failed (ofi_init.c:846:MPIDI_OFI_mpi_init_hook:No data available)
Thanks!
Michael
PS: "Long time no see"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
Very first recommendation is to switch to IMPI 2019 U6 if possible.
We have HCOLL support on Intel MPI level starting IMPI 2019 U5. (available via I_MPI_COLL_EXTERNAL=1)
The following algorithms will be redirected to HCOLL:
I_MPI_ADJUST_ALLREDUCE=24, I_MPI_ADJUST_BARRIER=11, I_MPI_ADJUST_BCAST=16, I_MPI_ADJUST_REDUCE=13, I_MPI_ADJUST_ALLGATHER=6, I_MPI_ADJUST_ALLTOALL=5, I_MPI_ADJUST_ALLTOALLV=5
Minimal requirement for OFI/mlx provider is UCX 1.4+ (starting IMPI 2019 U6)
Yes, you have to have the runtime available on the nodes in order to use FI_PROVIDER=mlx
There is no any kind of additional requirements/knobs for EDR/HDR
In case you have FDR (and not Connect-IB) on the nodes, then you may need to set UCX_TLS=ud,sm,self for large scale runs and for small scale you may play with UCX_TLS=rc,sm,self
We are working on a way to make it smoother.
BR,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitry,
Thanks for the details on the interaction between IMPI 2019.06 and HCOLL and UCX. It's a good ability to chose HCOLL or Intel collective implementation by just specifying the "algorithm" in I_MPI_ADJUST_XXX.
I understand that UCX/HCOLL are not Intel s/w stacks, but there is a large base of HPC users that use these with Mellanox h/w. These are already optimized and can use the low-level h/w accelerators on Mellanox h/w. Leveraging these via IntelMPI is quiet beneficial for all as we won't have to resort to different MPI stacks.
Thanks!
Michael

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page