- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to do WRF code on a mic card following the instruction from https://software.intel.com/en-us/articles/how-to-get-wrf-running-on-the-intelr-xeon-phitm-coprocessor
However, when I try to do
mpiexec.hydra -np 1 ./wrf.exe
It gave the following error
-bash: /opt/intel//impi/5.0.2.044/mic/bin/mpiexec.hydra: cannot execute binary file
I am wondering whether anyone had this problem before? mpiexe.hydra is set using "source /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh"
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi YH,
When you build the MPI application for the coprocessor, you set the environment variables for the coprocessor as you did. However, if you want to execute the application from host then you need to set your environment variables for the host before launching the application:
# source /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, I did what you suggest by sourcing the intel64 mpi variable environment and executed the following
mpiexec.hydra -host mic0 -np 1 ./wrf.exe (then I got the following error)
[mpiexec@eagle] HYDU_getfullhostname (../../utils/others/others.c:146): getaddrinfo error (hostname: eagle, error: Name or service not known)
[mpiexec@eagle] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1094): unable to get local hostname
[mpiexec@eagle] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:350): unable to create PMI port
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:987): process manager returned error launching processes
Then, I edited the /etc/hosts by appending "eagle" onto 127.0.0.1 and executed the same command
mpiexec.hydra -host mic0 -np 1 ./wrf.exe (then I got the following error)
bash: /opt/intel/impi/5.0.2.044/intel64/bin/pmi_proxy: No such file or directory
I could locate the pmi_proxy though which gave me
/opt/intel/composer_xe_2015.1.133/mpirt/bin/ia32/pmi_proxy
/opt/intel/composer_xe_2015.1.133/mpirt/bin/intel64/pmi_proxy
/opt/intel/composer_xe_2015.1.133/mpirt/bin/mic/pmi_proxy
/opt/intel/impi/5.0.2.044/intel64/bin/pmi_proxy
/opt/intel/impi/5.0.2.044/mic/bin/pmi_proxy
/usr/mpi/gcc/mvapich2-2.0/bin/hydra_pmi_proxy
/usr/mpi/gcc/mvapich2-2.0/share/man/man1/hydra_pmi_proxy.1
Wonder what's going on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You didn't say whether you mounted the host filesystem on mic so as to make the same mpi paths visible there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I didn't do what you have mentioned? can you be more specific on how to do this "mount the host filesystem on mic"
All I did was to have the nfs mount on the rundir across the mic card.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello YH,
As per the following thread I see that you were able to mount the /micNfs directory on mic. Correct? Similarly in order to see the Intel MPI related files on MIC you will have to mount the /opt directory on MIC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sunny,
I did the /opt nfs mount across the mic cards. But it's now, when i executed
mpiexec.hydra -hosts mic0 -np 1 ./wrf.exe (It hung there and with ctrl-c got the following error)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[mpiexec@eagle] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@eagle] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy
[mpiexec@eagle] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:175): unable to send signal downstream
[mpiexec@eagle] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@eagle] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:495): error waiting for event
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:1011): process manager error waiting for completion
Any ideas?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is your wrf seen on mic?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the rundir which contain the wrf.exe (and all the other files in the same directory) nfs mount to the mic.
Yes, if I ssh into mic0, I can see the wrf.exe and real.exe in the rundir. I can run ./real.exe there no problem, I believe I can also do ./wrf.exe once ./real.exe is done but I didn't try that.
What I want to do is to launch the app from the host, so that I could later do a symmetric mode execution. But, I couldn't do so.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi YH,
Did you look look at this article. It demonstrates WRF Conus2.5km on Intel® Xeon Phi™ Coprocessors and Intel® Xeon® processors in Symmetric Mode.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I followed the instruction on this page as well. But, I got struck at the execution of mpiexec.hydra.
According the webpage, I am supposeto source /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh then launch the wrf.exe from the host
But then, loc-nguyen (intel) pointed it out that I should do # source /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh before mpiexec.hydra.
Either way, it didn't work out.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi YH,
Correct MPI usage model:
1. To run MIC binaries from the HOST side:
host$ . /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh
host$ export I_MPI_MIC=1
host$ mpiexec.hydra -host mic0 -np 1 ./wrf.exe
2. To run MIC binaries from the MIC side:
host-mic0$ . /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh
host-mic0$ mpiexec.hydra -np 1 ./wrf.exe
In both cases wrf.exe should have elf64-k1om file format.
You can use '/usr/linux-k1om-4.7/bin/x86_64-k1om-linux-objdump -a wrf.exe' (on host) to check.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Arm
For option 1,
host$ source . /opt/intel/impi/5.0.2.044/intel64/bin/mpivars.sh
host$ export I_MPI_MIC=1
host$ mpiexec.hydra -host mic0 -np 1 ./wrf.exe
[mpiexec@eagle] HYDU_getfullhostname (../../utils/others/others.c:146): getaddrinfo error (hostname: eagle, error: Name or service not known)
[mpiexec@eagle] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1094): unable to get local hostname
[mpiexec@eagle] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:350): unable to create PMI port
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:987): process manager returned error launching processes
For option 2,
host-mic0$ source . /opt/intel/impi/5.0.2.044/mic/bin/mpivars.sh
host-mic0$ mpiexec.hydra -np 1 ./wrf.exe
./wrf.exe: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
I can run using option 2 by
export LD_LIBRARY_PATH=/opt/intel/composer_xe_2015.1.133/compiler/lib/mic:$LD_LIBRARY_PATH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello YH,
Are you still having trouble with running wrf on Intel® Xeon Phi™ Coprocessor ?
For errors you are getting with option 1, can you please verify what is the content of your /etc/hosts file. Also for running with option 1 could you please try this for me.
scp wrf.exe mic0:/tmp/. //copy the executable to mic0 //now run the command from host mpiexec.hydra -host mic0 -np 1 /tmp/wrf.exe
Also for option 2: Do you still have issues after setting LD_LIBRARY_PATH environment variable.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sunny,
At my host system,
cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 eagle
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.1 eagle-mic0 mic0 #Generated-by-micctrl
10.10.10.2 eagle-mic1 mic1 #Generated-by-micctrl
10.10.10.3 eagle-mic2 mic2 #Generated-by-micctrl
At mic0
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
10.10.10.254 host eagle
10.10.10.1 eagle-mic0 mic0
10.10.10.2 eagle-mic1 mic1
10.10.10.3 eagle-mic2 mic2
I could run option 2 after setting the LD_LIBRARY_PATH. No problem with option 2. But it is not my goal. Just a sanity check for my binary code working or not.
The results of what you suggested is the following (by the way my host firewalld is completely stop)
[proxy:0:0@eagle-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "eagle-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@eagle-mic0] main (../../pm/pmiserv/pmip.c:397): unable to connect to server 127.0.0.1 at port 38843 (check for firewalls!)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[mpiexec@eagle] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@eagle] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy
[mpiexec@eagle] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:175): unable to send signal downstream
[mpiexec@eagle] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@eagle] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:495): error waiting for event
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:1011): process manager error waiting for completion
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi YH,
Can you please let me know if you are using static (default) networking or internal bridge for this configuration. From one your posts I see that you are using static(default) network configuration. Correct? In that case I see that your coprocessor IP addresses on your host in /etc/hosts are not correct. There should be entries like:
172.31.1.1 eagle-mic0 mic0 172.31.2.1 eagle-mic1 mic1 172.31.3.1 eagle-mic2 mic2
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I tested the nfs mount, I revert back to the default network 172.31.1.1. I just wanted to make sure that the internal bridge wasn't causing any issues.
For the MPI, I would like to try it using multiple MIC cards. That's why I used internal bridge network setup.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi YH,
Can you please try running sample test program to verify all MPI related settings.
//On host cp /opt/intel/impi_latest/test ~/. cd test source /opt/intel/composerxe/bin/compilervars.sh intel64 source /opt/intel/impi_latest/bin64/mpivars.sh mpiicc -o test.host test.c mpiicc -mmic -o test.mic test.c scp test.mic mic0:/tmp/. export I_MPI_MIC=1 mpiexec.hydra -host `hostname` -np 2 ./test.host Hello world: rank 0 of 2 running on knightscorner5 Hello world: rank 1 of 2 running on knightscorner5 mpiexec.hydra -host mic0 -np 2 /tmp/test.mic Hello world: rank 0 of 2 running on knightscorner5-mic0 Hello world: rank 1 of 2 running on knightscorner5-mic0 mpiexec.hydra -host `hostname` -np 2 ./test.host : -host mic0 -np 2 /tmp/test.mic Hello world: rank 0 of 4 running on knightscorner5 Hello world: rank 1 of 4 running on knightscorner5 Hello world: rank 2 of 4 running on knightscorner5-mic0 Hello world: rank 3 of 4 running on knightscorner5-mic0
If you see error like this:
[mpiexec@knightscorner5] Sending Ctrl-C to processes as requested [mpiexec@knightscorner5] Press Ctrl-C again to force abort [mpiexec@knightscorner5] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor) [mpiexec@knightscorner5] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:247): unable to write data to proxy [mpiexec@knightscorner5] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:172): unable to send signal downstream [mpiexec@knightscorner5] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status [mpiexec@knightscorner5] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:484): error waiting for event [mpiexec@knightscorner5] main (../../ui/mpich/mpiexec.c:930): process manager error waiting for completion
Probably one of the reason for this is : /opt/intel is not mounted correctly on that coprocessor. Verify the /etc/fstab on the coprocessor and try executing mount -a. If mount is successful you will be good to go but if the mount fails then you will have to come back to the host and modify the /etc/exports file and /etc/hosts.allow to include all the coprocessors (ip_addresses). Once you have updated the required files:
exportfs -a service nfs restart //may not be required but still to be safe
ssh mic0 mount -a
You should be able to mount, if /etc/fstab has correct entries.
If you need further help with running MPI on Intel® Xeon Phi™ Coprocessor, you can refer to the following tutorial: Using the Intel® MPI Library on Intel® Xeon Phi™ Coprocessor Systems
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi YH,
As far as I see your HOST's /etc/hosts file is now:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 eagle
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.1 eagle-mic0 mic0 #Generated-by-micctrl
10.10.10.2 eagle-mic1 mic1 #Generated-by-micctrl
10.10.10.3 eagle-mic2 mic2 #Generated-by-micctrl
Could you please try to correct it to something like:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.254 host eagle
10.10.10.1 eagle-mic0 mic0 #Generated-by-micctrl
10.10.10.2 eagle-mic1 mic1 #Generated-by-micctrl
10.10.10.3 eagle-mic2 mic2 #Generated-by-micctrl
Then check that these connection paths work fine:
eagle$ ssh mic0
eagle$ ssh 10.10.10.1
mic0$ ssh eagle
mic0$ ssh 10.10.10.254
'hostname -i' on the host should return an IP address accessible from the MIC cards.
If it won't help could you please run the failed scenario with '-v' option (mpiexec.hydra -v ...) and provide the output?
Another proposal is to try to use 'mpiexec.hydra -iface ...':
'-iface mic0' - for static pair network configuration
'-iface br0' - for bridge network configuration
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sunny and Artem,
on line 19 (Sunny's post)
mpiexec.hydra -host mic0 -np 2 /tmp/
test
.mic (the output is following)
[proxy:0:0@eagle-mic0] HYDU_sock_connect (../../utils/sock/sock.c:268): unable to connect from "eagle-mic0" to "127.0.0.1" (Connection refused)
[proxy:0:0@eagle-mic0] main (../../pm/pmiserv/pmip.c:397): unable to connect to server 127.0.0.1 at port 49052 (check for firewalls!)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[mpiexec@eagle] HYDU_sock_write (../../utils/sock/sock.c:417): write error (Bad file descriptor)
[mpiexec@eagle] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:244): unable to write data to proxy
[mpiexec@eagle] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:175): unable to send signal downstream
[mpiexec@eagle] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@eagle] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:495): error waiting for event
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:1011): process manager error waiting for completion
I have try to remount the NFS. It doesn't look to me there is any issues. it didn't give me any error in `mount -a`. And, if modify file in one directory, I can see it change on the other.
Change line 19 again to the following, with addition line (10.10.10.254 host eagle) in /etc/hosts
mpiexec.hydra -iface mic0 -host mic0 -np 2 /tmp/test.mic
[mpiexec@eagle] HYDU_sock_get_iface_ip (../../utils/sock/sock.c:812): unable to find interface mic0
[mpiexec@eagle] HYDU_sock_create_and_listen_portstr (../../utils/sock/sock.c:1074): unable to get network interface IP
[mpiexec@eagle] HYD_pmci_launch_procs (../../pm/pmiserv/pmiserv_pmci.c:350): unable to create PMI port
[mpiexec@eagle] main (../../ui/mpich/mpiexec.c:987): process manager returned error launching processes
Change line 19 again to the following, with addition line (10.10.10.254 host eagle) in /etc/hosts
mpiexec.hydra -iface br0 -host mic0 -np 2 /tmp/test.mic
Hello world: rank 0 of 2 running on eagle-mic0
Hello world: rank 1 of 2 running on eagle-mic0
This time, it worked. I am going to try it on wrf.exe. I will post if I run into problem.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I am doing the symmetric mode, it gave this following error.
>>mpiexec.hydra -host -iface br0 -iface `hostname` -np 2 ./test.host : -iface br0 -host mic0 -np 2 /tmp/test.mic
[proxy:0:0@eagle-mic0] HYDU_create_process (../../utils/launch/launch.c:591): execvp error on file br0 (No such file or directory)
[proxy:0:0@eagle-mic0] HYDU_create_process (../../utils/launch/launch.c:591): execvp error on file br0 (No such file or directory)
^C[mpiexec@eagle] Sending Ctrl-C to processes as requested
[mpiexec@eagle] Press Ctrl-C again to force abort
[yhue@eagle test]$ mpiexec.hydra -host `hostname` -np 2 ./test.host : -iface br0 -host mic0 -np 2 /tmp/test.mic
eagle:SCM:1c83:909f5b40: 89 us(89 us): open_hca: device mlx4_0 not found
eagle:SCM:1c83:909f5b40: 77 us(77 us): open_hca: device mlx4_0 not found
eagle:SCM:1c84:8ac18b40: 115 us(115 us): open_hca: device mlx4_0 not found
eagle:CMA:1c83:909f5b40: 42 us(42 us): open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle:SCM:1c84:8ac18b40: 85 us(85 us): open_hca: device mlx4_0 not found
eagle:CMA:1c83:909f5b40: 38 us(38 us): open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle:CMA:1c84:8ac18b40: 43 us(43 us): open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle:SCM:1c83:909f5b40: 85 us(85 us): open_hca: device mthca0 not found
eagle:CMA:1c84:8ac18b40: 42 us(42 us): open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle:SCM:1c83:909f5b40: 80 us(80 us): open_hca: device mthca0 not found
eagle:SCM:1c84:8ac18b40: 98 us(98 us): open_hca: device mthca0 not found
eagle:SCM:1c83:909f5b40: 81 us(81 us): open_hca: device ipath0 not found
eagle:SCM:1c84:8ac18b40: 90 us(90 us): open_hca: device mthca0 not found
eagle:SCM:1c83:909f5b40: 84 us(84 us): open_hca: device ipath0 not found
eagle:SCM:1c84:8ac18b40: 88 us(88 us): open_hca: device ipath0 not found
eagle:SCM:1c83:909f5b40: 83 us(83 us): open_hca: device ehca0 not found
eagle:SCM:1c84:8ac18b40: 86 us(86 us): open_hca: device ipath0 not found
eagle:CMA:1c83:909f5b40: 39 us(39 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:SCM:1c84:8ac18b40: 81 us(81 us): open_hca: device ehca0 not found
eagle:UCM:1c83:909f5b40: 79 us(79 us): open_hca: mlx4_0 not found
eagle:CMA:1c84:8ac18b40: 39 us(39 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:UCM:1c83:909f5b40: 76 us(76 us): open_hca: mlx4_0 not found
eagle:UCM:1c84:8ac18b40: 82 us(82 us): open_hca: mlx4_0 not found
eagle:UCM:1c83:909f5b40: 78 us(78 us): open_hca: mthca0 not found
eagle:UCM:1c84:8ac18b40: 78 us(78 us): open_hca: mlx4_0 not found
eagle:UCM:1c83:909f5b40: 78 us(78 us): open_hca: mthca0 not found
eagle:UCM:1c84:8ac18b40: 80 us(80 us): open_hca: mthca0 not found
eagle:CMA:1c83:909f5b40: 38 us(38 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:UCM:1c84:8ac18b40: 77 us(77 us): open_hca: mthca0 not found
eagle:CMA:1c83:909f5b40: 36 us(36 us): open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle:CMA:1c84:8ac18b40: 46 us(46 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle:SCM:1c83:909f5b40: 83 us(83 us): open_hca: device mlx4_0 not found
eagle:CMA:1c84:8ac18b40: 37 us(37 us): open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle:SCM:1c83:909f5b40: 77 us(77 us): open_hca: device mlx4_0 not found
eagle:SCM:1c84:8ac18b40: 83 us(83 us): open_hca: device mlx4_0 not found
eagle:SCM:1c83:909f5b40: 74 us(74 us): open_hca: device scif0 not found
eagle:SCM:1c84:8ac18b40: 79 us(79 us): open_hca: device mlx4_0 not found
eagle:UCM:1c83:909f5b40: 74 us(74 us): open_hca: scif0 not found
eagle:CMA:1c83:909f5b40: 36 us(36 us): open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle:SCM:1c84:8ac18b40: 81 us(81 us): open_hca: device scif0 not found
eagle:SCM:1c83:909f5b40: 77 us(77 us): open_hca: device mlx4_0 not found
eagle:UCM:1c84:8ac18b40: 79 us(79 us): open_hca: scif0 not found
eagle:SCM:1c83:909f5b40: 75 us(75 us): open_hca: device mlx4_0 not found
eagle:CMA:1c84:8ac18b40: 37 us(37 us): open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle:SCM:1c83:909f5b40: 71 us(71 us): open_hca: device mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 81 us(81 us): open_hca: device mlx4_0 not found
eagle:SCM:1c83:909f5b40: 75 us(75 us): open_hca: device mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 79 us(79 us): open_hca: device mlx4_0 not found
eagle:UCM:1c83:909f5b40: 75 us(75 us): open_hca: mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 80 us(80 us): open_hca: device mlx4_1 not found
eagle:UCM:1c83:909f5b40: 75 us(75 us): open_hca: mlx4_1 not found
eagle:SCM:1c84:8ac18b40: 80 us(80 us): open_hca: device mlx4_1 not found
eagle:UCM:1c84:8ac18b40: 74 us(74 us): open_hca: mlx4_1 not found
eagle:SCM:1c83:909f5b40: 72 us(72 us): open_hca: device mlx5_0 not found
eagle:UCM:1c84:8ac18b40: 73 us(73 us): open_hca: mlx4_1 not found
eagle:SCM:1c83:909f5b40: 71 us(71 us): open_hca: device mlx5_0 not found
eagle:SCM:1c83:909f5b40: 71 us(71 us): open_hca: device mlx5_1 not found
eagle:SCM:1c84:8ac18b40: 84 us(84 us): open_hca: device mlx5_0 not found
eagle:SCM:1c83:909f5b40: 71 us(71 us): open_hca: device mlx5_1 not found
eagle:SCM:1c84:8ac18b40: 76 us(76 us): open_hca: device mlx5_0 not found
eagle:UCM:1c83:909f5b40: 71 us(71 us): open_hca: mlx5_0 not found
eagle:SCM:1c84:8ac18b40: 85 us(85 us): open_hca: device mlx5_1 not found
eagle:UCM:1c83:909f5b40: 69 us(69 us): open_hca: mlx5_0 not found
eagle:UCM:1c83:909f5b40: 73 us(73 us): open_hca: mlx5_1 not found
eagle:SCM:1c84:8ac18b40: 76 us(76 us): open_hca: device mlx5_1 not found
eagle:UCM:1c83:909f5b40: 71 us(71 us): open_hca: mlx5_1 not found
eagle:UCM:1c84:8ac18b40: 78 us(78 us): open_hca: mlx5_0 not found
eagle:UCM:1c84:8ac18b40: 81 us(81 us): open_hca: mlx5_0 not found
eagle:UCM:1c84:8ac18b40: 67 us(67 us): open_hca: mlx5_1 not found
eagle:UCM:1c84:8ac18b40: 67 us(67 us): open_hca: mlx5_1 not found
eagle-mic0:SCM:136d:64ec3b40: 232 us(232 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 227 us(227 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 216 us(216 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 199 us(199 us): open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136d:64ec3b40: 580 us(580 us): open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle-mic0:CMA:136e:3e9b6b40: 815 us(815 us): open_hca: getaddr_netdev ERROR:No such device. Is ib0 configured?
eagle-mic0:CMA:136d:64ec3b40: 548 us(548 us): open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle-mic0:CMA:136e:3e9b6b40: 713 us(713 us): open_hca: getaddr_netdev ERROR:No such device. Is ib1 configured?
eagle-mic0:SCM:136d:64ec3b40: 218 us(218 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 211 us(211 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 231 us(231 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 242 us(242 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 208 us(208 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 207 us(207 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 206 us(206 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 209 us(209 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 217 us(217 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 209 us(209 us): open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136d:64ec3b40: 553 us(553 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:CMA:136e:3e9b6b40: 729 us(729 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:UCM:136d:64ec3b40: 181 us(181 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 172 us(172 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 187 us(187 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 187 us(187 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 176 us(176 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 174 us(174 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 199 us(199 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 191 us(191 us): open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136e:3e9b6b40: 615 us(615 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:CMA:136d:64ec3b40: 785 us(785 us): open_hca: getaddr_netdev ERROR:No such device. Is eth2 configured?
eagle-mic0:CMA:136e:3e9b6b40: 559 us(559 us): open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle-mic0:CMA:136d:64ec3b40: 717 us(717 us): open_hca: getaddr_netdev ERROR:No such device. Is eth3 configured?
eagle-mic0:SCM:136d:64ec3b40: 211 us(211 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 237 us(237 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 208 us(208 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 298 us(298 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 226 us(226 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 221 us(221 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 171 us(171 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 165 us(165 us): open_hca: ibv_get_device_list() failed
eagle-mic0:CMA:136d:64ec3b40: 562 us(562 us): open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle-mic0:CMA:136e:3e9b6b40: 750 us(750 us): open_hca: getaddr_netdev ERROR:Cannot assign requested address. Is mic0:ib configured?
eagle-mic0:SCM:136e:3e9b6b40: 770 us(770 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 934 us(934 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 211 us(211 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 211 us(211 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 201 us(201 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 197 us(197 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 208 us(208 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 201 us(201 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 163 us(163 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 217 us(217 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 163 us(163 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 159 us(159 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 223 us(223 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 216 us(216 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 218 us(218 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 405 us(405 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 203 us(203 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 394 us(394 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136d:64ec3b40: 212 us(212 us): open_hca: ibv_get_device_list() failed
eagle-mic0:SCM:136e:3e9b6b40: 199 us(199 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 169 us(169 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 168 us(168 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 164 us(164 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 157 us(157 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 172 us(172 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 166 us(166 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136d:64ec3b40: 171 us(171 us): open_hca: ibv_get_device_list() failed
eagle-mic0:UCM:136e:3e9b6b40: 177 us(177 us): open_hca: ibv_get_device_list() failed
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(784)......:
MPID_Init(1323)............: channel initialization failed
MPIDI_CH3_Init(141)........:
MPID_nem_tcp_post_init(644):
MPID_nem_tcp_connect(1103).:
getConnInfoKVS(849)........: PMI_KVS_Get failed
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(784)......:
MPID_Init(1323)............: channel initialization failed
MPIDI_CH3_Init(141)........:
MPID_nem_tcp_post_init(644):
MPID_nem_tcp_connect(1103).:
getConnInfoKVS(849)........: PMI_KVS_Get failed
![](/skins/images/45C6C2D737ED71F4C51F0145C8CB1E9C/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page