- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear HPC mpi users
Iam trying to paralliaze a C code .when I try to run with 3 node I have been getting this error.
mpdboot_hpcmas02.kfupm.edu.sa (handle_mpd_output 892): failed to ping mpd on hpc073; received output={}
but program running smoothly in local.
mpd.hosts
hpc073
hpc074
I checked mpd is running and there is no firewall
kindly share your ideas
Regards
Ashraf
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide some details?
What MPI library do you use?
Can you make 'ssh hpc074' from hpc073 and vice versa without entering password and passphrase (if you use ssh connection)?
Could you provide 'mpdboot' command line?
Could you provide the output with '-d' option?
I'll try to help you to resolve this issue.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
version 2.1 (MPI-2.1)
I have 3 node
master node (hpcmas01) and compute node (hpc073 and hpc074).
passwordless working from
hpcmaso1 to hpc073 and hpc074
mpdboot -r ssh -f /root/mpd.hosts -d
debug: starting
running mpdallexit on hpcmas02.kfupm.edu.sa
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py --ncpus=1 --myhost=hpcmas02.kfupm.edu.sa -e -d -s 1
debug: mpd on hpcmas02.kfupm.edu.sa on port 55001
debug: info for running mpd: {'ip': '', 'ncpus': 1, 'list_port': 55001, 'entry_port': '', 'host': 'hpcmas02.kfupm.edu.sa', 'entry_host': '', 'ifhn': ''}
[root@hpcmas02 ~]#
Regards
Ashraf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>master node (hpcmas01) and compute node (hpc073 and hpc074)
Looks like your master node is hpcmas02!
Debug output doesn't show ip address of the hpcmas02 node:
>debug: info for running mpd: {'ip': '',
Could you run:
$ host hpcmas02
$ host hpc073
$ host hpc074
IP-addresses have to be configured. Probably you need to change /etc/hosts file.
Provide the output after command:
$ mpdboot -r ssh -f /root/mpd.hosts -n 3 -d
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
Please find the out put of the command as you requested.
#host hpcmas02
Host hpcmas02 not found: 2(SERVFAIL).
[root@hpcmas02 etc]# host hpc073
Host hpc073 not found: 2(SERVFAIL)
[root@hpcmas02 etc]# host hpc074
Host hpc074 not found: 2(SERVFAIL)
But I checked /etc/hosts file, all entry there
10.146.1.200 hpcmgt hpcmgt
10.146.1.130 hpcmas02 hpcmas02
10.146.1.61 hpc061
10.146.1.62 hpc062
10.146.1.63 hpc063
10.146.1.73 hpc073
10.146.1.74 hpc074
10.146.1.76 hpc076
10.146.1.77 hpc077
10.146.1.78 hpc078
also
[root@hpcmas02 etc]# mpdboot -r ssh -f /root/mpd.hosts -n 3 -d
debug: starting
running mpdallexit on hpcmas02
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py --ncpus=1 --myhost=hpcmas02 -e -d -s 3
debug: mpd on hpcmas02 on port 55000
debug: info for running mpd: {'ip': '10.146.1.130', 'ncpus': 1, 'list_port': 55000, 'entry_port': '', 'host': 'hpcmas02', 'entry_host': '', 'ifhn': ''}
debug: launch cmd= ssh -x -n -q hpc073 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55000 --ifhn=10.146.1.73 --ncpus=1 --myhost=hpc073 --myip=10.146.1.73 -e -d -s 3
debug: launch cmd= ssh -x -n -q hpc074 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55000 --ifhn=10.146.1.74 --ncpus=1 --myhost=hpc074 --myip=10.146.1.74 -e -d -s 3
debug: mpd on hpc073 on port 39268
debug: info for running mpd: {'ip': '10.146.1.73', 'ncpus': 1, 'list_port': 39268, 'entry_port': 55000, 'host': 'hpc073', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 31767}
debug: mpd on hpc074 on port 44024
debug: info for running mpd: {'ip': '10.146.1.74', 'ncpus': 1, 'list_port': 44024, 'entry_port': 55000, 'host': 'hpc074', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 31768}
Best Regards
Ashraf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you check the mpd ring?
Run:
$ mpdtrace
When do you get the error message you included into the first post?
BTW: your /etc/nsswitch.conf should contain a line like:
hosts: files dns
to point where to look for ip-addresses
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for swift response
I put this caommand in master node (hpcmas02)
#mpdtrace (I hope it is working).
hpcmas02
hpc074
hpc073
Other nodes also getting same output.
It looks ,it is working now.
But how can test it.
Can you give a sample code.
Please
Thanks for your help
Regards
Ashraf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mpdtrace shows nodes in a mpd-ring. So, in your ring there are 3 nodes. Everything should work fine.
In installation directory there is 'test' sub-directory where you can find HelloWorld test cases (for fortran, c and c++ languages). Just compile them and try out.
# mpicc test.c
# mpiexec -n 8 ./a.out
I'm ready to help you if you have an issue.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually Iam very happy now.
Now the program is running in parallel.
Thanks a lot
Kind Regards
Ashraf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Best wishes,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for bothering you.
What is the diffrence between mpi and openmp.
How can compile and run parallel same program using openmp.
kindly help
Best Regards
Ashraf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OpenMP is another programming paradigm. Using of pragma ("#pragma omp ...") you can point to compiler blocks of code which can be run in parallel (e.g. a loop) and compiler will try to create new threads. In MPI world mpiexec creates new processes.
You cannot compile MPI program by regular compiler.
To compile your openMP application you need to add '-openmp' option for Intel Compiler and '-fopenmp' for gcc compiler.
This forum is not the best place to get knowledge about openMP. Just google and you'll find a lot of information and examples.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Urgent request.
Yestrday everthing was fine.But today when I run the program is is getting the folloeing error.
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc073: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpc074: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
problem with execution of ./ashru.ex on hpcmas02: [Errno 2] No such file or directory
Kindly help as soon possiblem.
please c the following out put.
[mulhem@hpcmas02 ~]$ mpdtrace
hpcmas02
hpc074
hpc073
mpdboot -r ssh -f /home/mulhem/mpd.hosts -n 3 -d
debug: starting
running mpdallexit on hpcmas02
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py --ncpus=1 --myhost=hpcmas02 -e -d -s 3
debug: mpd on hpcmas02 on port 55001
debug: info for running mpd: {'ip': '10.146.1.130', 'ncpus': 1, 'list_port': 55001, 'entry_port': '', 'host': 'hpcmas02', 'entry_host': '', 'ifhn': ''}
debug: launch cmd= ssh -x -n -q hpc073 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55001 --ifhn=10.146.1.73 --ncpus=1 --myhost=hpc073 --myip=10.146.1.73 -e -d -s 3
debug: launch cmd= ssh -x -n -q hpc074 env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 HOSTNAME=$HOSTNAME /opt/intel/impi/4.0.0.028/intel64/bin/mpd.py -h hpcmas02 -p 55001 --ifhn=10.146.1.74 --ncpus=1 --myhost=hpc074 --myip=10.146.1.74 -e -d -s 3
debug: mpd on hpc073 on port 60394
debug: info for running mpd: {'ip': '10.146.1.73', 'ncpus': 1, 'list_port': 60394, 'entry_port': 55001, 'host': 'hpc073', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 17230}
debug: mpd on hpc074 on port 54816
debug: info for running mpd: {'ip': '10.146.1.74', 'ncpus': 1, 'list_port': 54816, 'entry_port': 55001, 'host': 'hpc074', 'entry_host': 'hpcmas02', 'ifhn': '', 'pid': 17231}
kindly help as soon as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you sure that you entered correct name? Might be it should be ./ashru.exe?
Next time, please, provide full command line - analyzing will be much easier.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Sir,
I have a similar error:
mpdboot_node24 (handle_mpd_output 892): failed to ping mpd on node24.domain.com; received output={}
My "entry_port" and "entry_host" is blank on the mpdboot debug:
t0363nd@node26:/home/t0363nd> /appl/msc/msc2013.1/msc20131/linux64/intel/bin64/mpdboot -r ssh -f hosts -d
debug: starting
running mpdallexit on node26
debug: launch cmd= env I_MPI_JOB_TAGGED_PORT_OUTPUT=1 /appl/msc/msc2013.1/msc20131/linux64/intel/bin64/mpd.py --ncpus=1 --myhost=node26 -e -d -s 1
debug: mpd on node26 on port 51090
debug: info for running mpd: {'ip': '10.134.160.55', 'ncpus': 1, 'list_port': 51090, 'entry_port': '', 'host': 'node26', 'entry_host': '', 'ifhn': ''
Is that part of my issue?
I am using legacy code with IntelMPI 2.3
Regards,
Joe
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page