- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Here is my environment information
Intel one API 2021.2
CentOS 7.6
MLNX_OFED_LINUX-5.3-1.0.0.1
UCX 1.10
I have two HCAs per node, but only one Active.
When I run a test of 18 nodes, I get results in 0.9 seconds.
[root@n00001 mpi-test]# time mpirun -np 18 -ppn 1 -f hostfile2 hostname real 0m0.939s |
When I run tests with more than 18 nodes, I need 41 seconds to get results.
[root@n00001 mpi-test]# time mpirun -np 20 -ppn 1 -f hostfile2 hostname real 0m41.026s |
Do you have any suggestions on this question? Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please let us know about your cluster topology, whether all nodes are connected to the same switch or different?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Shivani,
Thank you for your reply.
My network is a fat tree non-blocking network, and these nodes are connected to the same switch.
BTW.
I tested Intel MPI 2018 and Openmpi did not have this problem.
Intel MPI 2018:
[root@n00001 mpi-test]# mpirun -V real 0m0.496s |
OpenMPI:
[root@n00001 mpi-test]# time /usr/mpi/gcc/openmpi-4.1.0rc5/bin/mpirun --allow-run-as-root -np 20 --hostfile hostfile2 -N 1 hostname real 0m0.938s |
Thanks & Regards
Rui
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the details. We are working on it and will get back to you soon.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please repeat the 18 and 20 node runs with I_MPI_HYDRA_DEBUG=1 and provide the results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Results for 18 nodes
[root@n00001 mpi-test]# time mpirun -np 18 -ppn 1 -f hostfile2 -genv I_MPI_HYDRA_DEBUG=1 hostname real 0m0.944s |
Results for 20 nodes
[root@n00001 mpi-test]# time mpirun -np 20 -ppn 1 -f hostfile2 -genv I_MPI_HYDRA_DEBUG=1 hostname real 0m40.957s |
Thanks & Regards
Rui
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please try I_MPI_HYDRA_BRANCH_COUNT=0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Due to lack of reply, I am closing Intel support on this thread. Any further posts on this thread will be community only. If you need further Intel support assistance, please create a new thread.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page