Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

failed to ping mpd

ashruakkodegmail_com
1,652 Views


Dear HPC mpi users

Iam trying to paralliaze a C code .when I try to run with 3 node I have been getting this error.
mpdboot_hpcmas02.kfupm.edu.sa (handle_mpd_output 892): failed to ping mpd on hpc073; received output={}

but program running smoothly in local.

mpd.hosts
hpc073
hpc074

I checked mpd is running and there is no firewall
kindly share your ideas


Regards
Ashraf

0 Kudos
14 Replies
Dmitry_K_Intel2
Employee
1,652 Views
Hi Ashraf,

Can you provide some details?
What MPI library do you use?
Can you make 'ssh hpc074' from hpc073 and vice versa without entering password and passphrase (if you use ssh connection)?
Could you provide 'mpdboot' command line?
Could you provide the output with '-d' option?

I'll try to help you to resolve this issue.

Regards!
Dmitry
0 Kudos
ashruakkodegmail_com
1,652 Views
Hi Mitry
version 2.1 (MPI-2.1)

I have 3 node
master node (hpcmas01) and compute node (hpc073 and hpc074).

passwordless working from
hpcmaso1 to hpc073 and hpc074

mpdboot -r ssh -f /root/mpd.hosts -d
debug: starting
running mpdallexit on hpcmas02.kfupm.edu.sa
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py --ncpus=1 --myhost=hpcmas02.kfupm.edu.sa -e -d -s 1
debug: mpd on hpcmas02.kfupm.edu.sa on port 55001
debug: info for running mpd: {'ip': '', 'ncpus': 1, 'list_port': 55001, 'entry_port': '', 'host': 'hpcmas02.kfupm.edu.sa', 'entry_host': '', 'ifhn': ''}
[root@hpcmas02 ~]#

Regards
Ashraf
0 Kudos
Dmitry_K_Intel2
Employee
1,652 Views
Hi Ashraf,

>master node (hpcmas01) and compute node (hpc073 and hpc074)
Looks like your master node is hpcmas02!

Debug output doesn't show ip address of the hpcmas02 node:
>debug: info for running mpd: {'ip': '',

Could you run:
$ host hpcmas02
$ host hpc073
$ host hpc074

IP-addresses have to be configured. Probably you need to change /etc/hosts file.

Provide the output after command:
$ mpdboot -r ssh -f /root/mpd.hosts -n 3 -d

Regards!
Dmitry
0 Kudos
ashruakkodegmail_com
1,652 Views
Hi Dmity

Thanks for your reply.

Please find the out put of the command as you requested.
#host hpcmas02
Host hpcmas02 not found: 2(SERVFAIL).

[root@hpcmas02 etc]# host hpc073
Host hpc073 not found: 2(SERVFAIL)
[root@hpcmas02 etc]# host hpc074
Host hpc074 not found: 2(SERVFAIL)

But I checked /etc/hosts file, all entry there

10.146.1.200 hpcmgt hpcmgt
10.146.1.130 hpcmas02 hpcmas02
10.146.1.61 hpc061
10.146.1.62 hpc062
10.146.1.63 hpc063
10.146.1.73 hpc073
10.146.1.74 hpc074
10.146.1.76 hpc076
10.146.1.77 hpc077
10.146.1.78 hpc078

also
[root@hpcmas02 etc]# mpdboot -r ssh -f /root/mpd.hosts -n 3 -d
debug: starting
running mpdallexit on hpcmas02
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py --ncpus=1 --myhost=hpcmas02 -e -d -s 3
debug: mpd on hpcmas02 on port 55000
debug: info for running mpd: {'ip': '10.146.1.130', 'ncpus': 1, 'list_port': 55000, 'entry_port': '', 'host': 'hpcmas02', 'entry_host': '', 'ifhn': ''}
debug: launch cmd= ssh -x -n -q hpc073 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55000 --ifhn=10.146.1.73 --ncpus=1 --myhost=hpc073 --myip=10.146.1.73 -e -d -s 3
debug: launch cmd= ssh -x -n -q hpc074 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55000 --ifhn=10.146.1.74 --ncpus=1 --myhost=hpc074 --myip=10.146.1.74 -e -d -s 3
debug: mpd on hpc073 on port 39268
debug: info for running mpd: {'ip': '10.146.1.73', 'ncpus': 1, 'list_port': 39268, 'entry_port': 55000, 'host': 'hpc073', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 31767}
debug: mpd on hpc074 on port 44024
debug: info for running mpd: {'ip': '10.146.1.74', 'ncpus': 1, 'list_port': 44024, 'entry_port': 55000, 'host': 'hpc074', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 31768}


Best Regards
Ashraf




0 Kudos
Dmitry_K_Intel2
Employee
1,652 Views
So, Ashraf, you were able to start an mpd ring, weren't you?

Could you check the mpd ring?

Run:
$ mpdtrace

When do you get the error message you included into the first post?

BTW: your /etc/nsswitch.conf should contain a line like:
hosts: files dns
to point where to look for ip-addresses

Regards!
Dmitry
0 Kudos
ashruakkodegmail_com
1,652 Views

thanks for swift response

I put this caommand in master node (hpcmas02)

#mpdtrace (I hope it is working).
hpcmas02
hpc074
hpc073

Other nodes also getting same output.
It looks ,it is working now.

But how can test it.
Can you give a sample code.
Please

Thanks for your help
Regards
Ashraf



0 Kudos
Dmitry_K_Intel2
Employee
1,652 Views
Ashraf,

mpdtrace shows nodes in a mpd-ring. So, in your ring there are 3 nodes. Everything should work fine.

In installation directory there is 'test' sub-directory where you can find HelloWorld test cases (for fortran, c and c++ languages). Just compile them and try out.

# mpicc test.c
# mpiexec -n 8 ./a.out

I'm ready to help you if you have an issue.

Regards!
Dmitry
0 Kudos
ashruakkodegmail_com
1,652 Views
Dear Dmitry



Actually Iam very happy now.

Now the program is running in parallel.


Thanks a lot


Kind Regards
Ashraf
0 Kudos
Dmitry_K_Intel2
Employee
1,652 Views
With my pleasure!

Best wishes,
Dmitry
0 Kudos
ashruakkodegmail_com
1,652 Views
Dear Dmitry

Sorry for bothering you.

What is the diffrence between mpi and openmp.

How can compile and run parallel same program using openmp.

kindly help

Best Regards
Ashraf
0 Kudos
Dmitry_K_Intel2
Employee
1,652 Views
Hi Ashraf,

OpenMP is another programming paradigm. Using of pragma ("#pragma omp ...") you can point to compiler blocks of code which can be run in parallel (e.g. a loop) and compiler will try to create new threads. In MPI world mpiexec creates new processes.
You cannot compile MPI program by regular compiler.
To compile your openMP application you need to add '-openmp' option for Intel Compiler and '-fopenmp' for gcc compiler.

This forum is not the best place to get knowledge about openMP. Just google and you'll find a lot of information and examples.

Regards!
Dmitry
0 Kudos
ashruakkodegmail_com
1,652 Views
Dear Sir

Urgent request.

Yestrday everthing was fine.But today when I run the program is is getting the folloeing error.
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory

Kindly help as soon possiblem.
please c the following out put.
[mulhem@hpcmas02 ~]$ mpdtrace
hpcmas02
hpc074
hpc073


mpdboot -r ssh -f /home/mulhem/mpd.hosts -n 3 -d
debug: starting
running mpdallexit on hpcmas02
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py --ncpus=1 --myhost=hpcmas02 -e -d -s 3
debug: mpd on hpcmas02 on port 55001
debug: info for running mpd: {'ip': '10.146.1.130', 'ncpus': 1, 'list_port': 55001, 'entry_port': '', 'host': 'hpcmas02', 'entry_host': '', 'ifhn': ''}
debug: launch cmd= ssh -x -n -q hpc073 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55001 --ifhn=10.146.1.73 --ncpus=1 --myhost=hpc073 --myip=10.146.1.73 -e -d -s 3
debug: launch cmd= ssh -x -n -q hpc074 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55001 --ifhn=10.146.1.74 --ncpus=1 --myhost=hpc074 --myip=10.146.1.74 -e -d -s 3
debug: mpd on hpc073 on port 60394
debug: info for running mpd: {'ip': '10.146.1.73', 'ncpus': 1, 'list_port': 60394, 'entry_port': 55001, 'host': 'hpc073', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 17230}
debug: mpd on hpc074 on port 54816
debug: info for running mpd: {'ip': '10.146.1.74', 'ncpus': 1, 'list_port': 54816, 'entry_port': 55001, 'host': 'hpc074', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 17231}

kindly help as soon as possible.
0 Kudos
Dmitry_K_Intel2
Employee
1,652 Views
>problem with execution of ./ashru.ex
Are you sure that you entered correct name? Might be it should be ./ashru.exe?

Next time, please, provide full command line - analyzing will be much easier.

Regards!
Dmitry

0 Kudos
Joe_G_
Beginner
1,652 Views

Dear Sir,

I have a similar error:

mpdboot_node24 (handle_mpd_output 892): failed to ping mpd on node24.domain.com; received output={}

My "entry_port" and "entry_host" is blank on the mpdboot debug:

t0363nd@node26:/home/t0363nd> /appl/msc/msc2013.1/msc20131/linux64/intel/bin64/mpdboot -r ssh -f hosts -d
debug: starting
running mpdallexit on node26
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /appl/msc/msc2013.1/msc20131/linux64/intel/bin64/mpd.py   --ncpus=1 --myhost=node26 -e -d -s 1
debug: mpd on node26  on port 51090
debug: info for running mpd: {'ip': '10.134.160.55', 'ncpus': 1, 'list_port': 51090, 'entry_port': '', 'host': 'node26', 'entry_host': '', 'ifhn': ''

 

Is that part of my issue?

I am using legacy code with IntelMPI 2.3

 

Regards,

Joe

0 Kudos
Reply