Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpdboot error

Xiaoming_Cao
Beginner
416 Views
I use intel impi 3.2.2.006.
When I want to mpdboot 2 hosts with debug mode:

mpdboot -v -d -r ssh -n 2 -f ./mpd.conf

the error message shows:

debug: starting
totalnum=2 numhosts=1
there are not enough hosts on which to start all processes

mpd.conf
node001:8
node002:8
And the python version in both nodes is 2.4.3.

How can I solve it?
0 Kudos
2 Replies
Gergana_S_Intel
Employee
416 Views

Hi Xiaoming,

The issue here is that the Intel MPI Library seems to think your ./mpd.conf file only contains a single machine name. Can you verify if that's true? Sometimes that happens if your hosts file has strange EOF symbols (for example, when copied from Microsoft to a Unix machine), or one of the lines is commented out inadvertanly, etc. Are you starting the mpdboot command from node001 or node002 or some other machine?

Also, what does your /etc/hosts file look like?

Regards,
~Gergana

0 Kudos
Xiaoming_Cao
Beginner
416 Views

The mpd.conf file is edited on Linux machine instead of copying. No strange symbol is included. I start the mpdboot command from node001.

For /etc/hosts, no message is about node001 or node002 although I can freely ssh to any calculation node. I am not the administrator, so I do not how it works. However, I know the node001 message would be as follow if it existed in /etc/hosts,

10.141.0.1 node001.cm.cluster node001

Hi Xiaoming,

The issue here is that the Intel MPI Library seems to think your ./mpd.conf file only contains a single machine name. Can you verify if that's true? Sometimes that happens if your hosts file has strange EOF symbols (for example, when copied from Microsoft to a Unix machine), or one of the lines is commented out inadvertanly, etc. Are you starting the mpdboot command from node001 or node002 or some other machine?

Also, what does your /etc/hosts file look like?

Regards,
~Gergana


0 Kudos
Reply