- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm using Intel-MPI3 (icc & ifort 10 compilers) on a two node cluster with Ethernet interconnect.
The mpdboot command:
# mpdboot --totalnum=2 --file=/root/mpd.hosts --mpd=/opt/MPI_LIBS/INTEL-MPI/bin64/mpd --verbose --ncpus=4 --ifhn=10a0101
gave following error:
running mpdallexit on 10a0101
LAUNCHED mpd on 10a0101 via
RUNNING: mpd on 10a0101
LAUNCHED mpd on compute-0-0 via 10a0101
mpdboot_10a0101 (handle_mpd_output 589): from mpd on compute-0-0, invalid port info:
connect to address 10.255.255.254: Connection refused
connect to address 10.255.255.254: Connection refused
trying normal rsh (/usr/bin/rsh)
32833
If --rsh=/usr/bin/ssh option is used, mpdboot works fine. But again gives error during a job submission across 2 nodes.
With MPICH2, mpdboot and the job submission are working without any error.
I'm not getting why its not happening with Intel MPI.
Can someone help me out to resolve this issue?
- Sanagmesh
I'm using Intel-MPI3 (icc & ifort 10 compilers) on a two node cluster with Ethernet interconnect.
The mpdboot command:
# mpdboot --totalnum=2 --file=/root/mpd.hosts --mpd=/opt/MPI_LIBS/INTEL-MPI/bin64/mpd --verbose --ncpus=4 --ifhn=10a0101
gave following error:
running mpdallexit on 10a0101
LAUNCHED mpd on 10a0101 via
RUNNING: mpd on 10a0101
LAUNCHED mpd on compute-0-0 via 10a0101
mpdboot_10a0101 (handle_mpd_output 589): from mpd on compute-0-0, invalid port info:
connect to address 10.255.255.254: Connection refused
connect to address 10.255.255.254: Connection refused
trying normal rsh (/usr/bin/rsh)
32833
If --rsh=/usr/bin/ssh option is used, mpdboot works fine. But again gives error during a job submission across 2 nodes.
With MPICH2, mpdboot and the job submission are working without any error.
I'm not getting why its not happening with Intel MPI.
Can someone help me out to resolve this issue?
- Sanagmesh
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sanagmesh,
It looks like a known bug. I belive that it should not appear in the latest release.
Package ID: l_mpi_p_3.1.026
Could you clarify the package ID for the Intel MPI Library you have? Itcan be found in the mpisupport.txt file. Would it be possible for you to do an upgrade if you have an older version?
Best regards, Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using:
Package ID: l_mpi_p_3.0.043
Is it happen in every cluster, if booted on >1 node?
Thanks
-Sangamesh
Package ID: l_mpi_p_3.0.043
Is it happen in every cluster, if booted on >1 node?
Thanks
-Sangamesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it acceptable for you to do an upgrade to Intel MPI Library 3.1? If not so I would suggest you request a patch for "invalid port info" issueat https://premier.intel.com. As far as I know it is available for 3.0.043 package
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I upgraded the Intel MPI to 3.1 version. Now I can mpdboot without any errors.
Thanks..
-Sangamesh
Thanks..
-Sangamesh
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page