- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am trying to run an application from the host machine on a coprocessor. When I execute the test command
mpirun -n 1 -host mic0 hostname
I get the following error message
[proxy:0:0@machine-mic0.domain] HYDU_sock_connect (./utils/sock/sock.c:241): unable to connect from "machine-mic0.domain" to "127.0.0.1" (Connection refused)
[proxy:0:0@machine-mic0.domain] main (./pm/pmiserv/pmip.c:353): unable to connect to server 127.0.0.1 at port 42661 (check for firewalls!)
When I ssh into the coprocessor and run the same command, I get the expected output.
I have checked that environmnet variable I_MPI_MIC is set to 'enable', I have disabled the host firewall, and since /opt/intel is not available over NFS, I have copied the necessary libraries to the coprocessor. I'm not sure where to proceed from here.
Best Regards,
Michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 127.0.0.1 address is the loopback address. I am not sure why MPI is trying to use that address. Could you check your /etc/host file on the coprocessor to make sure the host and coprocessor are in there with the right addresses and the names MPI is using?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The /etc/hosts file on the coprocessor reads
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
172.31.1.254 host machine.domain
172.31.1.1 machine-mic0.domain mic0
172.31.2.1 machine-mic1.domain mic1
I can ssh into the mic and ssh back into the host.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
It looks like the command fails due to specific network settings.
Could you please provide output of the following commands (from the host side):
hostname -i
I_MPI_MIC=1 mpirun -v -n 1 -host mic0 hostname
mpirun -V
cat /etc/hosts
Could you please also try the following command:
I_MPI_MIC=1 mpirun -localhost 172.31.1.254 -n 1 -host mic0 hostname
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Artem R. (Intel) wrote:
Hi Michael,
It looks like the command fails due to specific network settings.
Could you please provide output of the following commands (from the host side):
hostname -i
I_MPI_MIC=1 mpirun -v -n 1 -host mic0 hostname
mpirun -V
cat /etc/hostsCould you please also try the following command:
I_MPI_MIC=1 mpirun -localhost 172.31.1.254 -n 1 -host mic0 hostname
Hi Artem,
Thanks for your reply. I ran the commands on the host and list the results below.
hostname -i
127.0.0.1
I_MPI_MIC=1 mpirun -v -n 1 -host mic0 hostname
I've attached the file mpirun_output.txt with the output
mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831
Copyright (C) 2003-2012, Intel Corporation. All rights reserved
cat /etc/hosts
#127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
127.0.0.1 localhost machine.domain machine
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.1.1 mic0.local mic0
172.31.2.1 mic1.local mic1
172.31.1.1 machine-mic0.domain mic0 #Generated-by-micctrl
172.31.2.1 machine-mic1.domain mic1 #Generated-by-micctrl
I_MPI_MIC=1 mpirun -localhost 172.31.1.254 -n 1 -host mic0 hostname
The mpirun command didn't recognize the -localhost option and displayed the help screen
Thank you again,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
Is it possible for you to try the latest Intel MPI Library versions (5.1.x)? '-localhost' option is implemented there and should help with this issue.
Otherwise you should correct your network settings. 'hostname -i' on the host should report IP address '172.31.1.254'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
I added the appropriate IP addresses to the /etc/hosts file, which enabled the mpirun commands to run hostname sucessfully. Thank you!
To continue testing, I followed the instructions here and compiled the montecarlo.c program. I can run it successfully on the host machine or on mic0 from the host, but if I try to use both the host and mic0 to process the program with
mpirun -n 1 -host machine /tmp/montecarlo : -n 1 -host mic0 /tmp/montecarlo
, I receive a list of error messages:
machine-mic0.domain:SCM:2fa8:afe08700: 245 us(245 us): open_hca: ibv_get_device_list() failed
machine-mic0.domain:SCM:2fa8:afe08700: 201 us(201 us): open_hca: ibv_get_device_list() failed
machine-mic0.domain:CMA:2fa8:afe08700: 621 us(621 us): open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
machine-mic0.domain:CMA:2fa8:afe08700: 559 us(559 us): open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
... (full log attached, ran with -v)
I am not sure if this is relevant, but the network between the host and mic are a static pair. Any advice for solving this will also be welcome.
Thanks in advance,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
It looks like ofed-mic service isn't running - please check. See the Intel® Manycore Platform Software Stack (Intel® MPSS) User's Guide for details.
You can find the similar example in the Intel® MPI Library Troubleshooting Guide.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page