- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
There is a problem that takes a lot of time during MPI startup, so I ask you a question.
The section that takes time is: library kind -> libfabric version -> libfabric provider -(which takes the most)> load tuning file.
Tue Apr 13 13:50:08 UTC 2021
[0] MPI startup(): Intel(R) MPI Library, Version 2021.2 Build 20210302 (id: f4f7c92cd)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): Load tuning file: "/opt/local/mpi/2021.2.0/etc/tuning_icx_shm-ofi_mlx.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 276153 ****0721.maru 0
[0] MPI startup(): 1 276154 ****0721.maru 1
[0] MPI startup(): 2 276155 ****0721.maru 2
The executed executable file is IMB-MPI1 and the execution script is as follows.
export I_MPI_HYDRA_PMI_CONNECT=alltoall
export I_MPI_DEBUG=5
export I_MPI_FABRICS=shm:ofi
export I_MPI_PIN=1
export I_MPI_PIN_PROCESSOR_LIST=0-75
export FI_PROVIDER=mlx
export UCX_TLS=rc,dc_mlx5,sm,self
{time mpiexec.hydra -genvall -f ./hostlist -n 33972 -ppn 76 IMB-MPI1 Bcast Allreduce -npmin 33972; } >> ${OUTFILE} 2>&1
#{ time mpiexec.hydra -genvall -f ./hostlist -n 67944 -ppn 76 IMB-MPI1 Bcast Allreduce -npmin 67944; } >> ${OUTFILE} 2>&1
#{ time mpiexec.hydra -genvall -f ./hostlist -n 131 328 -ppn 76 IMB-MPI1 Bcast Allreduce -npmin 67944; } >> ${OUTFILE} 2>&1
I testd 3 cases of mpirank (33,972 ranks, 67,944 ranks, 131,328 ranks), and it took about 33 seconds, 79 seconds and 131 seconds respectively. Startup takes a large part of the overall execution time, so please give us your opinion on what work to do to reduce it.
*Intel MPI version is 2021.2.0, UCX is 1.10.0 & MOFED 5.2-1.0.4.0
Thanks, Kihang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reply from our architect:
- I would recommend to try UD (you may simply remove UCX_TLS) as a way to improve startup time
- Could you please ask them to clarify the way they come to conclusions that it is the tuning file reading?
- Please ask them to try:
export I_MPI_STARTUP_MODE=pmi_shm_netmod
best regards,
Heinrich
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For information,
I can reproduce with MPI_INIT function only.
When I use 76,000 mpi rank, it take 68~72 seconds in MPI startup(MPI_INIT).
Here is more detailed log(I_MPI_DEBUG=1000).
[0] MPI startup(): libfabric version: 1.11.0-impi
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: ofi_rxm (111.0)
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: verbs (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "verbs" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: tcp (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "tcp" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: mlx (1.4)
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: shm (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "shm" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: sockets (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: ofi_hook_noop (111.0)
libfabric:361875:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:361875:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
[0] MPI startup(): libfabric provider: mlx
libfabric:361875:core:core:fi_fabric_():1406<info> Opened fabric: mlx
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrnamelen: 1024
<Here is most time-consuming part>
[3323] MPI startup(): selected platform: icx
[2667] MPI startup(): selected platform: icx
[3561] MPI startup(): selected platform: icx
[2743] MPI startup(): selected platform: icx
Please let me know if there are any suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For information,
I can reproduce with MPI_INIT function only.
When I use 76,000 mpi rank, it take 68~72 seconds in MPI startup(MPI_INIT).
Here is more detailed log(I_MPI_DEBUG=1000).
[0] MPI startup(): libfabric version: 1.11.0-impi
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: ofi_rxm (111.0)
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:361875:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: verbs (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "verbs" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: tcp (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "tcp" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: mlx (1.4)
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: shm (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "shm" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: sockets (111.0)
libfabric:361875:core:core:ofi_register_provider():455<info> "sockets" filtered by provider include/exclude list, skipping
libfabric:361875:core:core:ofi_register_provider():427<info> registering provider: ofi_hook_noop (111.0)
libfabric:361875:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:361875:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
[0] MPI startup(): libfabric provider: mlx
libfabric:361875:core:core:fi_fabric_():1406<info> Opened fabric: mlx
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrnamelen: 1024
<Here is most time-consuming part>
[3323] MPI startup(): selected platform: icx
[2667] MPI startup(): selected platform: icx
[3561] MPI startup(): selected platform: icx
[2743] MPI startup(): selected platform: icx
Please let me know if there are any suggestions.
Thanks in advance, Kihang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hii,
Thanks for reaching out to us.
We are working on it and will get back to you soon.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kihang,
I was sending a request to the IMPI architect for better startup parameters.
My first idea would be:
Do you have a file system that is faster than /opt/kma_local/mpi/2021.2.0/etc/ ?
Maybe it helps to read the tuning file from another file system. You may try to use variables from the autotuner to provide a different location for the tuning file:
$ export I_MPI_TUNING_BIN=<tuning-results.dat>
see:
best regards,
Heinrich
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reply from our architect:
- I would recommend to try UD (you may simply remove UCX_TLS) as a way to improve startup time
- Could you please ask them to clarify the way they come to conclusions that it is the tuning file reading?
- Please ask them to try:
export I_MPI_STARTUP_MODE=pmi_shm_netmod
best regards,
Heinrich
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Heinrich,
The option "I_MPI_STARTUP_MODE=pmi_shm_netmod" you recommend is works!
Could you explain the pmi_shm_netmod means? or Is there any manual about that?
Thanks, Kihang
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page