Software Archive
Read-only legacy content
17061 Discussions

launching mpiexec.hydra from host

YH
Beginner
7,865 Views

Hi,

I am trying to do WRF code on a mic card following the instruction from https://software.intel.com/en-us/articles/how-to-get-wrf-running-on-the-intelr-xeon-phitm-coprocessor

However, when I try to do 

mpiexec.hydra -np 1 ./wrf.exe

It gave the following error

-bash: /opt/intel//impi/5.0.2.044/mic/bin/mpiexec.hydra: cannot execute binary file

 

I am wondering whether anyone had this problem before? mpiexe.hydra is set using "source /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh"

Thanks

 

 

0 Kudos
36 Replies
Loc_N_Intel
Employee
5,658 Views

Hi YH,

When you build the MPI application for the coprocessor, you set the environment variables for the coprocessor as you did. However, if you want to execute the application from host then you need to set your environment variables for the host before launching the application:

source /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh

0 Kudos
YH
Beginner
5,658 Views

OK, I did what you suggest by sourcing the intel64 mpi variable environment and executed the following

mpiexec.hydra -host mic0 -np 1 ./wrf.exe   (then I got the following error)

[mpiexec@eagle] HYDU_getfullhostname (../../utils/others/others.c:146): getaddrinfo error (hostname: eagle, error: Name or service not known)
[mpiexec@eagle] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1094): unable to get local hostname
[mpiexec@eagle] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:350): unable to create PMI port
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:987): process manager returned error launching processes

Then, I edited the /etc/hosts by appending "eagle" onto 127.0.0.1 and executed the same command

mpiexec.hydra -host mic0 -np 1 ./wrf.exe   (then I got the following error)

bash: /opt/intel/impi/5.0.2.044/intel64/bin/pmi_proxy: No such file or directory

I could locate the pmi_proxy though which gave me 

/opt/intel/composer_xe_2015.1.133/mpirt/bin/ia32/pmi_proxy
/opt/intel/composer_xe_2015.1.133/mpirt/bin/intel64/pmi_proxy
/opt/intel/composer_xe_2015.1.133/mpirt/bin/mic/pmi_proxy
/opt/intel/impi/5.0.2.044/intel64/bin/pmi_proxy
/opt/intel/impi/5.0.2.044/mic/bin/pmi_proxy
/usr/mpi/gcc/mvapich2-2.0/bin/hydra_pmi_proxy
/usr/mpi/gcc/mvapich2-2.0/share/man/man1/hydra_pmi_proxy.1

Wonder what's going on.

 

 

 

 

0 Kudos
TimP
Honored Contributor III
5,658 Views

 

You didn't say whether you mounted the host filesystem on mic so as to make the same mpi paths visible there.

0 Kudos
YH
Beginner
5,658 Views

I didn't do what you have mentioned? can you be more specific on how to do this "mount the host filesystem on mic"

All I did was to have the nfs mount on the rundir across the mic card. 

Thanks.

0 Kudos
Sunny_G_Intel
Employee
5,658 Views

Hello YH,

As per the following thread I see that you were able to mount the /micNfs directory on mic. Correct? Similarly in order to see the Intel MPI related files on MIC you will have to mount the /opt directory on MIC. 

0 Kudos
YH
Beginner
5,658 Views

Hi Sunny,

I did the /opt nfs mount across the mic cards. But it's now, when i executed 

mpiexec.hydra -hosts mic0 -np 1 ./wrf.exe (It hung there and with ctrl-c got the following error)

^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[mpiexec@eagle] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@eagle] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy
[mpiexec@eagle] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:175): unable to send signal downstream
[mpiexec@eagle] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@eagle] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:495): error waiting for event
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:1011): process manager error waiting for completion

Any ideas?

Thanks,

0 Kudos
TimP
Honored Contributor III
5,658 Views

 

Is your wrf seen on mic?

0 Kudos
YH
Beginner
5,658 Views

I have the rundir which contain the wrf.exe (and all the other files in the same directory) nfs mount to the mic. 

Yes, if I ssh into mic0, I can see the wrf.exe and real.exe in the rundir. I can run ./real.exe there no problem, I believe I can also do ./wrf.exe once ./real.exe is done but I didn't try that. 

What I want to do is to launch the app from the host, so that I could later do a symmetric mode execution. But, I couldn't do so.

0 Kudos
Sunny_G_Intel
Employee
5,658 Views

Hi YH,

Did you look look at this article. It demonstrates WRF Conus2.5km on Intel® Xeon Phi™ Coprocessors and Intel® Xeon® processors in Symmetric Mode. 

Thanks

0 Kudos
YH
Beginner
5,658 Views

Yes, I followed the instruction on this page as well. But, I got struck at the execution of mpiexec.hydra. 

According the webpage, I am supposeto source /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh then launch the wrf.exe from the host

But then, loc-nguyen (intel) pointed it out that I should do # source /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh before mpiexec.hydra.

Either way, it didn't work out.

 

0 Kudos
Artem_R_Intel1
Employee
5,658 Views

Hi YH,

Correct MPI usage model:
1. To run MIC binaries from the HOST side:
host$ . /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh
host$ export I_MPI_MIC=1
host$ mpiexec.hydra -host mic0 -np 1 ./wrf.exe

2. To run MIC binaries from the MIC side:
host-mic0$ . /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh
host-mic0$ mpiexec.hydra -np 1 ./wrf.exe

In both cases wrf.exe should have elf64-k1om file format.
You can use  '/usr/linux-k1om-4.7/bin/x86_64-k1om-linux-objdump -a wrf.exe' (on host) to check.

 

 

0 Kudos
YH
Beginner
5,658 Views

Hi Arm

For option 1, 

host$ source . /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh
host$ export I_MPI_MIC=1
host$ mpiexec.hydra -host mic0 -np 1 ./wrf.exe

[mpiexec@eagle] HYDU_getfullhostname (../../utils/others/others.c:146): getaddrinfo error (hostname: eagle, error: Name or service not known)
[mpiexec@eagle] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1094): unable to get local hostname
[mpiexec@eagle] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:350): unable to create PMI port
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:987): process manager returned error launching processes

For option 2,

host-mic0$ source  . /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh
host-mic0$ mpiexec.hydra -np 1 ./wrf.exe

./wrf.exe: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory

I can run using option 2 by 

export LD_LIBRARY_PATH=/opt/intel/composer_xe_2015.1.133/compiler/lib/mic:$LD_LIBRARY_PATH

 

0 Kudos
Sunny_G_Intel
Employee
5,657 Views

Hello YH,

Are you still having trouble with running  wrf on Intel® Xeon Phi™ Coprocessor ? 

For errors you are getting with option 1, can you please verify what is the content of your /etc/hosts file. Also for running with option 1 could you please try this for me. 

scp wrf.exe mic0:/tmp/.         //copy the executable to mic0
//now run the command from host
mpiexec.hydra -host mic0 -np 1 /tmp/wrf.exe

Also for option 2: Do you still have issues after setting LD_LIBRARY_PATH environment variable.

Thanks

0 Kudos
YH
Beginner
5,657 Views

Hi Sunny,

At my host system,

cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 eagle
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.1    eagle-mic0 mic0 #Generated-by-micctrl
10.10.10.2    eagle-mic1 mic1 #Generated-by-micctrl
10.10.10.3    eagle-mic2 mic2 #Generated-by-micctrl

 

At mic0

127.0.0.1    localhost.localdomain localhost
::1        localhost.localdomain localhost
10.10.10.254    host eagle
10.10.10.1    eagle-mic0 mic0
10.10.10.2    eagle-mic1 mic1
10.10.10.3    eagle-mic2 mic2

I could run option 2 after setting the LD_LIBRARY_PATH. No problem with option 2. But it is not my goal. Just a sanity check for my binary code working or not.

The results of what you suggested is the following (by the way my host firewalld is completely stop)

[proxy:0:0@eagle-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "eagle-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@eagle-mic0] main (../../pm/pmiserv/pmip.c:397): unable to connect to server 127.0.0.1 at port 38843 (check for firewalls!)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[mpiexec@eagle] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@eagle] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy
[mpiexec@eagle] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:175): unable to send signal downstream
[mpiexec@eagle] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@eagle] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:495): error waiting for event
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:1011): process manager error waiting for completion

 

Thanks

 

 

 

0 Kudos
Sunny_G_Intel
Employee
5,657 Views

Hi YH,

Can you please let me know if you are using static (default) networking or internal bridge for this configuration. From one your posts I see that you are using static(default) network configuration. Correct? In that case I see that your coprocessor IP addresses on your host in /etc/hosts are not correct. There should be entries like:

172.31.1.1      eagle-mic0 mic0 
172.31.2.1      eagle-mic1 mic1 
172.31.3.1      eagle-mic2 mic2 

Thanks

0 Kudos
YH
Beginner
5,657 Views

When I tested the nfs mount, I revert back to the default network 172.31.1.1. I just wanted to make sure that the internal bridge wasn't causing any issues. 

For the MPI, I would like to try it using multiple MIC cards. That's why I used internal bridge network setup. 

0 Kudos
Sunny_G_Intel
Employee
5,658 Views

Hi YH,

Can you please try running sample test program to verify all MPI related settings.

//On host
cp /opt/intel/impi_latest/test ~/.
cd test

source /opt/intel/composerxe/bin/compilervars.sh intel64
source /opt/intel/impi_latest/bin64/mpivars.sh 

mpiicc -o test.host test.c 
mpiicc -mmic -o test.mic test.c

scp test.mic mic0:/tmp/.

export I_MPI_MIC=1 

mpiexec.hydra -host `hostname` -np 2 ./test.host 
Hello world: rank 0 of 2 running on knightscorner5
Hello world: rank 1 of 2 running on knightscorner5

mpiexec.hydra -host mic0 -np 2 /tmp/test.mic
Hello world: rank 0 of 2 running on knightscorner5-mic0
Hello world: rank 1 of 2 running on knightscorner5-mic0

mpiexec.hydra -host `hostname` -np 2 ./test.host : -host mic0 -np 2 /tmp/test.mic
Hello world: rank 0 of 4 running on knightscorner5
Hello world: rank 1 of 4 running on knightscorner5
Hello world: rank 2 of 4 running on knightscorner5-mic0
Hello world: rank 3 of 4 running on knightscorner5-mic0

If you see error like this:

[mpiexec@knightscorner5] Sending Ctrl-C to processes as requested
[mpiexec@knightscorner5] Press Ctrl-C again to force abort
[mpiexec@knightscorner5] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@knightscorner5] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:247): unable to write data to proxy
[mpiexec@knightscorner5] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:172): unable to send signal downstream
[mpiexec@knightscorner5] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@knightscorner5] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:484): error waiting for event
[mpiexec@knightscorner5] main (../../ui/mpich/mpiexec.c:930): process manager error waiting for completion

Probably one of the reason for this is : /opt/intel is not mounted correctly on that coprocessor. Verify the /etc/fstab on the coprocessor and try executing mount -a. If mount is successful you will be good to go but if the mount fails then you will have to come back to the host and modify the /etc/exports file and /etc/hosts.allow to include all the coprocessors (ip_addresses). Once you have updated the required files:

exportfs -a
service nfs restart   //may not be required but still to be safe
ssh mic0
mount -a

You should be able to mount, if /etc/fstab has correct entries.

If you need further help with running MPI on Intel® Xeon Phi™ Coprocessor, you can refer to the following tutorial: Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems 

Thanks

0 Kudos
Artem_R_Intel1
Employee
5,658 Views

Hi YH,

As far as I see your HOST's /etc/hosts file is now:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 eagle
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.1    eagle-mic0 mic0 #Generated-by-micctrl
10.10.10.2    eagle-mic1 mic1 #Generated-by-micctrl
10.10.10.3    eagle-mic2 mic2 #Generated-by-micctrl
 
Could you please try to correct it to something like:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.254    host eagle
10.10.10.1    eagle-mic0 mic0 #Generated-by-micctrl
10.10.10.2    eagle-mic1 mic1 #Generated-by-micctrl
10.10.10.3    eagle-mic2 mic2 #Generated-by-micctrl
 
Then check that these connection paths work fine:
eagle$ ssh mic0
eagle$ ssh 10.10.10.1

mic0$ ssh eagle
mic0$ ssh 10.10.10.254

'hostname -i' on the host should return an IP address accessible from the MIC cards.

If it won't help could you please run the failed scenario with '-v' option (mpiexec.hydra -v ...) and provide the output?

Another proposal is to try to use 'mpiexec.hydra -iface ...':
'-iface mic0' - for static pair network configuration
'-iface br0' - for bridge network configuration
 

0 Kudos
YH
Beginner
5,658 Views

Hi Sunny and Artem, 

on line 19 (Sunny's post)

mpiexec.hydra -host mic0 -np 2 /tmp/test.mic (the output is following)

[proxy:0:0@eagle-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "eagle-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@eagle-mic0] main (../../pm/pmiserv/pmip.c:397): unable to connect to server 127.0.0.1 at port 49052 (check for firewalls!)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[mpiexec@eagle] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@eagle] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy
[mpiexec@eagle] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:175): unable to send signal downstream
[mpiexec@eagle] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@eagle] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:495): error waiting for event
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:1011): process manager error waiting for completion

I have try to remount the NFS. It doesn't look to me there is any issues. it didn't give me any error in `mount -a`. And, if modify file in one directory, I can see it change on the other. 

Change line 19 again to the following, with addition line (10.10.10.254 host eagle) in /etc/hosts

mpiexec.hydra -iface mic0 -host mic0 -np 2 /tmp/test.mic
[mpiexec@eagle] HYDU_sock_get_iface_ip (../../utils/sock/sock.c:812): unable to find interface mic0
[mpiexec@eagle] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1074): unable to get network interface IP
[mpiexec@eagle] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:350): unable to create PMI port
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:987): process manager returned error launching processes

 

Change line 19 again to the following, with addition line (10.10.10.254 host eagle) in /etc/hosts

mpiexec.hydra -iface br0 -host mic0 -np 2 /tmp/test.mic
Hello world: rank 0 of 2 running on eagle-mic0
Hello world: rank 1 of 2 running on eagle-mic0

This time, it worked. I am going to try it on wrf.exe. I will post if I run into problem. 

Thank you.

 

 

 

0 Kudos
YH
Beginner
4,660 Views

If I am doing the symmetric mode, it gave this following error.

>>mpiexec.hydra -host -iface br0 -iface `hostname` -np 2 ./test.host : -iface br0 -host mic0 -np 2 /tmp/test.mic
[proxy:0:0@eagle-mic0] HYDU_create_process (../../utils/launch/launch.c:591): execvp error on file br0 (No such file or directory)
[proxy:0:0@eagle-mic0] HYDU_create_process (../../utils/launch/launch.c:591): execvp error on file br0 (No such file or directory)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[yhue@eagle test]$ mpiexec.hydra -host `hostname` -np 2 ./test.host : -iface br0 -host mic0 -np 2 /tmp/test.mic
eagle:SCM:1c83:909f5b40: 89 us(89 us):  open_hca: device mlx4_0 not found
eagle:SCM:1c83:909f5b40: 77 us(77 us):  open_hca: device mlx4_0 not found
eagle:SCM:1c84:8ac18b40: 115 us(115 us):  open_hca: device mlx4_0 not found
eagle:CMA:1c83:909f5b40: 42 us(42 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle:SCM:1c84:8ac18b40: 85 us(85 us):  open_hca: device mlx4_0 not found
eagle:CMA:1c83:909f5b40: 38 us(38 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle:CMA:1c84:8ac18b40: 43 us(43 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle:SCM:1c83:909f5b40: 85 us(85 us):  open_hca: device mthca0 not found
eagle:CMA:1c84:8ac18b40: 42 us(42 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle:SCM:1c83:909f5b40: 80 us(80 us):  open_hca: device mthca0 not found
eagle:SCM:1c84:8ac18b40: 98 us(98 us):  open_hca: device mthca0 not found
eagle:SCM:1c83:909f5b40: 81 us(81 us):  open_hca: device ipath0 not found
eagle:SCM:1c84:8ac18b40: 90 us(90 us):  open_hca: device mthca0 not found
eagle:SCM:1c83:909f5b40: 84 us(84 us):  open_hca: device ipath0 not found
eagle:SCM:1c84:8ac18b40: 88 us(88 us):  open_hca: device ipath0 not found
eagle:SCM:1c83:909f5b40: 83 us(83 us):  open_hca: device ehca0 not found
eagle:SCM:1c84:8ac18b40: 86 us(86 us):  open_hca: device ipath0 not found
eagle:CMA:1c83:909f5b40: 39 us(39 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:SCM:1c84:8ac18b40: 81 us(81 us):  open_hca: device ehca0 not found
eagle:UCM:1c83:909f5b40: 79 us(79 us):  open_hca: mlx4_0 not found
eagle:CMA:1c84:8ac18b40: 39 us(39 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:UCM:1c83:909f5b40: 76 us(76 us):  open_hca: mlx4_0 not found
eagle:UCM:1c84:8ac18b40: 82 us(82 us):  open_hca: mlx4_0 not found
eagle:UCM:1c83:909f5b40: 78 us(78 us):  open_hca: mthca0 not found
eagle:UCM:1c84:8ac18b40: 78 us(78 us):  open_hca: mlx4_0 not found
eagle:UCM:1c83:909f5b40: 78 us(78 us):  open_hca: mthca0 not found
eagle:UCM:1c84:8ac18b40: 80 us(80 us):  open_hca: mthca0 not found
eagle:CMA:1c83:909f5b40: 38 us(38 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:UCM:1c84:8ac18b40: 77 us(77 us):  open_hca: mthca0 not found
eagle:CMA:1c83:909f5b40: 36 us(36 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle:CMA:1c84:8ac18b40: 46 us(46 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:SCM:1c83:909f5b40: 83 us(83 us):  open_hca: device mlx4_0 not found
eagle:CMA:1c84:8ac18b40: 37 us(37 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle:SCM:1c83:909f5b40: 77 us(77 us):  open_hca: device mlx4_0 not found
eagle:SCM:1c84:8ac18b40: 83 us(83 us):  open_hca: device mlx4_0 not found
eagle:SCM:1c83:909f5b40: 74 us(74 us):  open_hca: device scif0 not found
eagle:SCM:1c84:8ac18b40: 79 us(79 us):  open_hca: device mlx4_0 not found
eagle:UCM:1c83:909f5b40: 74 us(74 us):  open_hca: scif0 not found
eagle:CMA:1c83:909f5b40: 36 us(36 us):  open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle:SCM:1c84:8ac18b40: 81 us(81 us):  open_hca: device scif0 not found
eagle:SCM:1c83:909f5b40: 77 us(77 us):  open_hca: device mlx4_0 not found
eagle:UCM:1c84:8ac18b40: 79 us(79 us):  open_hca: scif0 not found
eagle:SCM:1c83:909f5b40: 75 us(75 us):  open_hca: device mlx4_0 not found
eagle:CMA:1c84:8ac18b40: 37 us(37 us):  open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle:SCM:1c83:909f5b40: 71 us(71 us):  open_hca: device mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 81 us(81 us):  open_hca: device mlx4_0 not found
eagle:SCM:1c83:909f5b40: 75 us(75 us):  open_hca: device mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 79 us(79 us):  open_hca: device mlx4_0 not found
eagle:UCM:1c83:909f5b40: 75 us(75 us):  open_hca: mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 80 us(80 us):  open_hca: device mlx4_1 not found
eagle:UCM:1c83:909f5b40: 75 us(75 us):  open_hca: mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 80 us(80 us):  open_hca: device mlx4_1 not found
eagle:UCM:1c84:8ac18b40: 74 us(74 us):  open_hca: mlx4_1 not found
eagle:SCM:1c83:909f5b40: 72 us(72 us):  open_hca: device mlx5_0 not found
eagle:UCM:1c84:8ac18b40: 73 us(73 us):  open_hca: mlx4_1 not found
eagle:SCM:1c83:909f5b40: 71 us(71 us):  open_hca: device mlx5_0 not found
eagle:SCM:1c83:909f5b40: 71 us(71 us):  open_hca: device mlx5_1 not found
eagle:SCM:1c84:8ac18b40: 84 us(84 us):  open_hca: device mlx5_0 not found
eagle:SCM:1c83:909f5b40: 71 us(71 us):  open_hca: device mlx5_1 not found
eagle:SCM:1c84:8ac18b40: 76 us(76 us):  open_hca: device mlx5_0 not found
eagle:UCM:1c83:909f5b40: 71 us(71 us):  open_hca: mlx5_0 not found
eagle:SCM:1c84:8ac18b40: 85 us(85 us):  open_hca: device mlx5_1 not found
eagle:UCM:1c83:909f5b40: 69 us(69 us):  open_hca: mlx5_0 not found
eagle:UCM:1c83:909f5b40: 73 us(73 us):  open_hca: mlx5_1 not found
eagle:SCM:1c84:8ac18b40: 76 us(76 us):  open_hca: device mlx5_1 not found
eagle:UCM:1c83:909f5b40: 71 us(71 us):  open_hca: mlx5_1 not found
eagle:UCM:1c84:8ac18b40: 78 us(78 us):  open_hca: mlx5_0 not found
eagle:UCM:1c84:8ac18b40: 81 us(81 us):  open_hca: mlx5_0 not found
eagle:UCM:1c84:8ac18b40: 67 us(67 us):  open_hca: mlx5_1 not found
eagle:UCM:1c84:8ac18b40: 67 us(67 us):  open_hca: mlx5_1 not found
eagle-mic0:SCM:136d:64ec3b40: 232 us(232 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 227 us(227 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 216 us(216 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 199 us(199 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136d:64ec3b40: 580 us(580 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle-mic0:CMA:136e:3e9b6b40: 815 us(815 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle-mic0:CMA:136d:64ec3b40: 548 us(548 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle-mic0:CMA:136e:3e9b6b40: 713 us(713 us):  open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle-mic0:SCM:136d:64ec3b40: 218 us(218 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 211 us(211 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 231 us(231 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 242 us(242 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 208 us(208 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 207 us(207 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 206 us(206 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 209 us(209 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 217 us(217 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 209 us(209 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136d:64ec3b40: 553 us(553 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:CMA:136e:3e9b6b40: 729 us(729 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:UCM:136d:64ec3b40: 181 us(181 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 172 us(172 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 187 us(187 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 187 us(187 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 176 us(176 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 174 us(174 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 199 us(199 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 191 us(191 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136e:3e9b6b40: 615 us(615 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:CMA:136d:64ec3b40: 785 us(785 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:CMA:136e:3e9b6b40: 559 us(559 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle-mic0:CMA:136d:64ec3b40: 717 us(717 us):  open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle-mic0:SCM:136d:64ec3b40: 211 us(211 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 237 us(237 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 208 us(208 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 298 us(298 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 226 us(226 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 221 us(221 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 171 us(171 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 165 us(165 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136d:64ec3b40: 562 us(562 us):  open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle-mic0:CMA:136e:3e9b6b40: 750 us(750 us):  open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle-mic0:SCM:136e:3e9b6b40: 770 us(770 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 934 us(934 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 211 us(211 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 211 us(211 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 201 us(201 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 197 us(197 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 208 us(208 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 201 us(201 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 163 us(163 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 217 us(217 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 163 us(163 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 159 us(159 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 223 us(223 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 216 us(216 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 218 us(218 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 405 us(405 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 203 us(203 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 394 us(394 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 212 us(212 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 199 us(199 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 169 us(169 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 168 us(168 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 164 us(164 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 157 us(157 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 172 us(172 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 166 us(166 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 171 us(171 us):  open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 177 us(177 us):  open_hca: ibv_get_device_list() failed
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(784)......: 
MPID_Init(1323)............: channel initialization failed
MPIDI_CH3_Init(141)........: 
MPID_nem_tcp_post_init(644): 
MPID_nem_tcp_connect(1103).: 
getConnInfoKVS(849)........: PMI_KVS_Get failed
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(784)......: 
MPID_Init(1323)............: channel initialization failed
MPIDI_CH3_Init(141)........: 
MPID_nem_tcp_post_init(644): 
MPID_nem_tcp_connect(1103).: 
getConnInfoKVS(849)........: PMI_KVS_Get failed

 

0 Kudos
Reply