- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Intel colleagues,
I have just set up a new diskless cluster. Running IMB "Pingpong" with -genv I_MPI_FABRICS shm:dapl shows promising performance. But with -genv I_MPI_FABRICS shm:ofa things never worked. I have provided all system environment and execution traces below. Your help will be important to us.
# I_MPI_DEBUG 4
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 4 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
#I_MPI_DEBUG 2
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 2 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
#I_MPI_DEBUG 100
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 100 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 1 Build 20130522
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation. All rights reserved.
[0] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[1] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[0] MPI startup(): Found 1 IB devices
[1] MPI startup(): Found 1 IB devices
[1] MPI startup(): Open 0 IB device: mlx4_0
[0] MPI startup(): Open 0 IB device: mlx4_0
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.
icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.2.183 Build 20130514
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
env | grep I_MPI
I_MPI_ROOT=/opt/intel/impi/4.1.1.036
pdsh -w dn[01-06] ls /usr/lib64/libibverbs.so
dn01: /usr/lib64/libibverbs.so
dn02: /usr/lib64/libibverbs.so
dn05: /usr/lib64/libibverbs.so
dn06: /usr/lib64/libibverbs.so
dn03: /usr/lib64/libibverbs.so
dn04: /usr/lib64/libibverbs.so
ibstat -V
ibstat BUILD VERSION: 1.6.1.MLNX20130822.dfac5dd Build date: Aug 25 2013 11:19:43
uname -a
Linux dn01 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
ssh dn01
Last login: Tue Oct 28 10:56:44 2014 from head.cluster
head -n 20 /etc/dat.conf
# DAT v2.0, v1.2 configuration file
#
# Each entry should have the following fields:
#
# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
# <provider_version> <ia_params> <platform_params>
#
# For uDAPL cma provder, <ia_params> is one of the following:
# network address, network hostname, or netdev name and 0 for port
#
# For uDAPL scm provider, <ia_params> is device name and port
# For uDAPL ucm provider, <ia_params> is device name and port
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL RoCE provider, <ia_params> is device name and 0
#
#ON THIS CLUSTER, ONLY PORT 2 OF EACH HCA IS ACTIVATED
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1032855
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1032855
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
#if I_MPI_FALLBACK is enabled, then I_MPI_FABRICS shm:ofa will work, but apparently "falls back" to 1Gbit Ethernet
export I_MPI_FALLBACK=1
[root@head run-033]# /opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 4 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): fabric ofa failed: will try use tcp fabric
[1] MPI startup(): fabric ofa failed: will try use tcp fabric
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 30486 dn01 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
[0] MPI startup(): 1 29284 dn02 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
...(ifnored)
2097152 20 17770.48 112.55
4194304 10 35445.40 112.85
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have OFED* installed? In order to use the ofa fabric, you will need access to the native OFED* verbs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you be more specific? I do think I have MLNX OFED installed. Is there any command to test that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I followed this thread to solve my issue. But unfortunately i was not able to resolve it.
Both DAPL and OFA doesn't work for me.
Software Versions:
- MLNX_OFED_LINUX-2.3-1.0.1-rhel6.5-x86_64
- Intel parallel cluster 2015
- Intel MPSS 3.4.3
- Mellanox Infiniband Connect X-3 adapter
With OFA:
export I_MPI_MIC=1
export I_MPI_FABRICS=shm:ofa
export I_MPI_DEVICE=rdssm
export I_MPI_OFA_ADAPTER_NAME=mlx4_0
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-1u ,ofa-v2-scif0
export I_MPI_PIN_MODE=pm
export I_MPI_PIN_DOMAIN=auto
Error Messages: [export I_MPI_DEBUG=2]
[42] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[19] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[43] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[26] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[27] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
Error Messages: [export I_MPI_DEBUG=100]
[0] MPI startup(): Intel(R) MPI Library, Version 5.0 Update 1 Build 20140709
[0] MPI startup(): Copyright (C) 2003-2014 Intel Corporation. All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[1] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[2] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[5] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[9] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[10] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[3] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[4] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[6] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[7] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[8] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[11] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[9] MPI startup(): Found 2 IB devices
[10] MPI startup(): Found 2 IB devices
[6] MPI startup(): Found 2 IB devices
[8] MPI startup(): Found 2 IB devices
[7] MPI startup(): Found 2 IB devices
[11] MPI startup(): Found 2 IB devices
[0] MPI startup(): Found 2 IB devices
[1] MPI startup(): Found 2 IB devices
[3] MPI startup(): Found 2 IB devices
[2] MPI startup(): Found 2 IB devices
[4] MPI startup(): Found 2 IB devices
[5] MPI startup(): Found 2 IB devices
[10] MPI startup(): Open 0 IB device: mlx4_0
[6] MPI startup(): Open 0 IB device: mlx4_0
[9] MPI startup(): Open 0 IB device: mlx4_0
[8] MPI startup(): Open 0 IB device: mlx4_0
[5] MPI startup(): Open 0 IB device: mlx4_0
[7] MPI startup(): Open 0 IB device: mlx4_0
[3] MPI startup(): Open 0 IB device: mlx4_0
[1] MPI startup(): Open 0 IB device: mlx4_0
[4] MPI startup(): Open 0 IB device: mlx4_0
[0] MPI startup(): Open 0 IB device: mlx4_0
[11] MPI startup(): Open 0 IB device: mlx4_0
[42] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[2] MPI startup(): Open 0 IB device: mlx4_0
[36] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[31] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[37] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[37] MPI startup(): Found 0 IB devices
[31] MPI startup(): Found 0 IB devices
[38] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[38] MPI startup(): Found 0 IB devices
[40] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[40] MPI startup(): Found 0 IB devices
[20] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[33] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[33] MPI startup(): Found 0 IB devices
[20] MPI startup(): Found 0 IB devices
[17] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[43] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[43] MPI startup(): Found 0 IB devices
[13] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[25] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[25] MPI startup(): Found 0 IB devices
[27] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[30] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[30] MPI startup(): Found 0 IB devices
[17] MPI startup(): Found 0 IB devices
[23] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[23] MPI startup(): Found 0 IB devices
[27] MPI startup(): Found 0 IB devices
[12] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[12] MPI startup(): Found 0 IB devices
[13] MPI startup(): Found 0 IB devices
[29] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[29] MPI startup(): Found 0 IB devices
[15] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[15] MPI startup(): Found 0 IB devices
[35] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[35] MPI startup(): Found 0 IB devices
[36] MPI startup(): Found 0 IB devices
[39] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[39] MPI startup(): Found 0 IB devices
[22] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[22] MPI startup(): Found 0 IB devices
[41] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[41] MPI startup(): Found 0 IB devices
[42] MPI startup(): Found 0 IB devices
[24] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[24] MPI startup(): Found 0 IB devices
[26] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[26] MPI startup(): Found 0 IB devices
[14] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[14] MPI startup(): Found 0 IB devices
[16] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[16] MPI startup(): Found 0 IB devices
[18] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[18] MPI startup(): Found 0 IB devices
[28] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[28] MPI startup(): Found 0 IB devices
[32] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[32] MPI startup(): Found 0 IB devices
[19] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[19] MPI startup(): Found 0 IB devices
[36] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[34] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[34] MPI startup(): Found 0 IB devices
[31] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[37] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[38] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[21] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[21] MPI startup(): Found 0 IB devices
[30] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[20] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[39] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[13] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[33] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[25] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[40] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[17] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[28] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[23] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[41] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[29] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[42] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[12] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[24] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[43] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[32] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[27] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[14] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[15] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[34] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[21] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[35] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[16] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[22] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[26] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[18] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[19] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[10] MPI startup(): Start 1 ports per adapter
[11] MPI startup(): Start 1 ports per adapter
[0] MPI startup(): Start 1 ports per adapter
[2] MPI startup(): Start 1 ports per adapter
[5] MPI startup(): Start 1 ports per adapter
[3] MPI startup(): Start 1 ports per adapter
[1] MPI startup(): Start 1 ports per adapter
[7] MPI startup(): Start 1 ports per adapter
[8] MPI startup(): Start 1 ports per adapter
[6] MPI startup(): Start 1 ports per adapter
[9] MPI startup(): Start 1 ports per adapter
[4] MPI startup(): Start 1 ports per adapter
- While installing MPSS and starting the openibd service, i noticed that setting up infiniband network interfaces doesnt say OK
[root@tbx-node07 MLNX_OFED_LINUX-2.3-1.0.1-rhel6.5-x86_64]# service openibd start
Loading HCA driver and Access Layer: [ OK ]
Setting up InfiniBand network interfaces:
No configuration found for ib0
Setting up service network . . . [ done ]
[root@node07 ~]# ibv_devinfo [-On host]
Failed to query device propshca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.32.5100
node_guid: f452:1403:006a:9050
sys_image_guid: f452:1403:006a:9053
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x1
board_id: MT_1100120019
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
[On one of the MIC]
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.32.5100
node_guid: f452:1403:006a:9050
sys_image_guid: f452:1403:006a:9053
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x1
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
hca_id: scif0
transport: SCIF (2)
fw_ver: 0.0.1
node_guid: 4c79:baff:fe57:02a8
sys_image_guid: 4c79:baff:fe57:02a8
vendor_id: 0x8086
vendor_part_id: 0
hw_ver: 0x1
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 1001
port_lmc: 0x00
link_layer: SCI
- On host
[root@node07 ~]# ls /sys/class/infiniband
mlx4_0 scif0
- on mic
[root@node07 ~]# ssh mic0 ls /sys/class/infiniband
mlx4_0
scif0
- I_MPI_ROOT=/opt/intel/impi/5.0.1.035 is set to the following.
My setup has 4 mic cards in the server with 2 processors. Can you guys please help me in getting ofa and dapl work with intel mic's?
Please let me know if you need any additional information.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page