Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

mpdboot error

Xiaoming_Cao
Beginner
106 Views
I use intel impi 3.2.2.006.
When I want to mpdboot 2 hosts with debug mode:

mpdboot -v -d -r ssh -n 2 -f ./mpd.conf

the error message shows:

debug: starting
totalnum=2 numhosts=1
there are not enough hosts on which to start all processes

mpd.conf
node001:8
node002:8
And the python version in both nodes is 2.4.3.

How can I solve it?
0 Kudos
2 Replies
Gergana_S_Intel
Employee
106 Views

Hi Xiaoming,

The issue here is that the Intel MPI Library seems to think your ./mpd.conf file only contains a single machine name. Can you verify if that's true? Sometimes that happens if your hosts file has strange EOF symbols (for example, when copied from Microsoft to a Unix machine), or one of the lines is commented out inadvertanly, etc. Are you starting the mpdboot command from node001 or node002 or some other machine?

Also, what does your /etc/hosts file look like?

Regards,
~Gergana

Xiaoming_Cao
Beginner
106 Views

The mpd.conf file is edited on Linux machine instead of copying. No strange symbol is included. I start the mpdboot command from node001.

For /etc/hosts, no message is about node001 or node002 although I can freely ssh to any calculation node. I am not the administrator, so I do not how it works. However, I know the node001 message would be as follow if it existed in /etc/hosts,

10.141.0.1 node001.cm.cluster node001

Hi Xiaoming,

The issue here is that the Intel MPI Library seems to think your ./mpd.conf file only contains a single machine name. Can you verify if that's true? Sometimes that happens if your hosts file has strange EOF symbols (for example, when copied from Microsoft to a Unix machine), or one of the lines is commented out inadvertanly, etc. Are you starting the mpdboot command from node001 or node002 or some other machine?

Also, what does your /etc/hosts file look like?

Regards,
~Gergana


Reply