- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I tried to run the WRF in the Xeon and Xeon Phi to intel MPI.
I was referring to. https://software.intel.com/en-us/articles/wrf-conus12km-on-intel-xeon-phi-coprocessors-and-intel-xeon-processors
And ..
OMP: Warning # 234: KMP_AFFINITY: granularity = fine will be used.
[0] MPI startup (): Multi-threaded optimized library
[0] MPI startup (): RDMA, shared memory, and socket data transfer modes
[1] MPI startup (): RDMA, shared memory, and socket data transfer modes
[0] MPI startup (): can not open dynamic library libdat2.so.2
[0] MPI startup (): can not open dynamic library libdat2.so
[0] MPI startup (): can not open dynamic library libdat.so.1
[0] MPI startup (): can not open dynamic library libdat.so
[0] MPI startup (): dapl fabric is not available and fallback fabric is not enabled
[1] DAPL startup (): trying to open first DAPL provider from I_MPI_DAPL_PROVIDER_LIST: ofa-v2-mlx4_0-1u
phi-test-mic2: UCM: 2c8d: dd0ceb40: 205 us (205 us): open_hca: ibv_get_device_list () failed
[1] DAPL startup (): failed to open DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup (): dapl fabric is not available and fallback fabric is not enabled
I wait for your help.
Thank you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Check that OFED SW is installed and ofed-mic service is run on the host (refer to the Intel MPSS User's Guide for details).
BTW which version of Intel MPI Library do you use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem R.
By installing OFED results were as follows.
[mpiexec@phi-test] hostlist_fn (ui/mpich/utils.c:372): duplicate host file or host list setting
[mpiexec@phi-test] match_arg (utils/args/args.c:152): match handler returned error
[mpiexec@phi-test] HYDU_parse_array (utils/args/args.c:174): argument matching returned error
[mpiexec@phi-test] parse_args (ui/mpich/utils.c:1596): error parsing input array
[mpiexec@phi-test] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1648): unable to parse user arguments
[mpiexec@phi-test] main (ui/mpich/mpiexec.c:153): error parsing parameters
And this is my version of MPI.
Intel(R) MPI Library for Linux* OS, Version 5.0 Update 3 Build 20150128 (build id: 11250)
Copyright (C) 2003-2015, Intel Corporation. All rights reserved.
[mpiexec@phi-test] which mpirun
/opt/intel/impi/5.0.3.048/intel64/bin/mpirun
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Choi,
Could you please provide your MPI settings (I_MPI_* environment variables, command line options)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem R.
I'm sorry, I do not remember this command is heard.
I had such export.
export MIC_ULIMIT_STACKSIZE=365536
export I_MPI_DEVICE=rdssm
export I_MPI_MIC=1
export DAPL_DBG_TYPE=0
export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0
export I_MPI_PIN_MODE=pm
export I_MPI_PIN_DOMAIN=auto
Now this error occurs.
[test1@phi-test CONUS12_rundir]$ ./MIC.sh
OMP: Warning #234: KMP_AFFINITY: granularity=fine will be used.
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): RDMA, shared memory, and socket data transfer modes
[1] MPI startup(): RDMA, shared memory, and socket data transfer modes
[0] DAPL startup(): trying to open first DAPL provider from I_MPI_DAPL_PROVIDER_LIST: ofa-v2-mlx4_0-1u
[0] DAPL startup(): failed to open DAPL provider ofa-v2-mlx4_0-1u
[0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[1] DAPL startup(): trying to open first DAPL provider from I_MPI_DAPL_PROVIDER_LIST: ofa-v2-mlx4_0-1u
[1] DAPL startup(): failed to open DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The specified environment variables look correct.
With the mentioned error message I'd suspect command line options. Could you please provide your mpiexec.hydra/mpirun command line?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI Artem R.
<MIC.sh>
-----------------------------------------------------------------------------------------------------
#source /opt/intel/impi/5.0.3.048/mic/bin/mpivars.sh
source /opt/intel/composer_xe_2015.3.187/bin/compilervars.sh intel64
ulimit -s unlimited
#ulimit -l 1
export MIC_ULIMIT_STACKSIZE=365536
export I_MPI_DEVICE=rdssm
export I_MPI_MIC=1
export DAPL_DBG_TYPE=0
export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0
#export I_MPI_OFA_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0
export I_MPI_PIN_MODE=pm
export I_MPI_PIN_DOMAIN=auto
#export I_MPI_FABRICS=shm:dspl
#export I_MPI_DAPL_UD=enable
#export I_MPI_DAPL_UD_PROVIDER=ofa-v2-mlx4_0-1u
./run.symmetric
-----------------------------------------------------------------------------------------------------
mpiexec.hydra command line:
mpiexec.hydra -host phi-test -n 1 -env WRF_NUM_TILES 20 -env KMP_AFFINITY scatter -env OMP_NUM_THREADS 2 -env KMP_LIBRARY=turnaround -env OMP_SCHEDULE=static -env KMP_STACKSIZE=190M -env I_MPI_DEBUG 5 /home/test1/WRF_0715/WRF-XEON/WRFV3/CONUS12_rundir/x86/wrf.exe : -host phi-test-mic1 -n 1 -env KMP_AFFINITY balanced -env OMP_NUM_THREADS 30 -env KMP_LIBRARY=turnaround -env OMP_SCHEDULE=static -env KMP_STACKSIZE=190M -env I_MPI_DEBUG 5 /home/test1/WRF_0715/WRF-XEON/WRFV3/CONUS12_rundir/mic/wrf.sh
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding to the following error message:
[test1@phi-test CONUS12_rundir]$ ./MIC.sh
OMP: Warning #234: KMP_AFFINITY: granularity=fine will be used.
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): RDMA, shared memory, and socket data transfer modes
[1] MPI startup(): RDMA, shared memory, and socket data transfer modes
[0] DAPL startup(): trying to open first DAPL provider from I_MPI_DAPL_PROVIDER_LIST: ofa-v2-mlx4_0-1u
[0] DAPL startup(): failed to open DAPL provider ofa-v2-mlx4_0-1u
[0] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[1] DAPL startup(): trying to open first DAPL provider from I_MPI_DAPL_PROVIDER_LIST: ofa-v2-mlx4_0-1u
[1] DAPL startup(): failed to open DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
Doy you have any InfiniBand* adapters (IBA) on your system? ofa-v2-mlx4_0-1u DAPL provider is for IBA only.
If you don't have any IBA on the system you can use the following Intel MPI settings (DAPL fabric over SCIF):
export I_MPI_FABRICS=shm:dapl (use it instead of obsolete 'export I_MPI_DEVICE=rdssm')
export I_MPI_DAPL_PROVIDER=ofa-v2-scif0 (use it instead of 'export I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u,ofa-v2-scif0')
Or even try default Intel MPI fabrics settings (without I_MPI_DEVICE/I_MPI_FABRICS/I_MPI_DAPL_PROVIDER/I_MPI_DAPL_PROVIDER_LIST). Intel MPI Library should detect the fabric settings automatically in this case (you can monitor it via the debug information).
As another option you can try I_MPI_FABRICS=shm:tcp. It doesn't require OFED at all. But the performance may be worse than with DAPL+SCIF.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI Artem R.
Excellent ~ !! Thank you. Now it works the MPI.
First, I've tried.
export I_MPI_FABRICS = shm: dapl
export I_MPI_DAPL_PROVIDER = ofa-v2-scif0
But I got an error.
[Test1 @ phi-test CONUS12_rundir] $ ./MIC.sh
[0] MPI startup (): Multi-threaded optimized library
[0] DAPL startup (): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[0] DAPL startup (): failed to open DAPL provider ofa-v2-scif0
[0] MPI startup (): dapl fabric is not available and fallback fabric is not enabled
[1] DAPL startup (): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-scif0
[1] DAPL startup (): failed to open DAPL provider ofa-v2-scif0
[1] MPI startup (): dapl fabric is not available and fallback fabric is not enabled
Secondly, so I had to change.
export I_MPI_FABRICS = shm: tcp
[Test1 @ phi-test CONUS12_rundir] $ ./MIC.sh
[0] MPI startup (): Multi-threaded optimized library
[0] MPI startup (): shm and tcp data transfer modes
[1] MPI startup (): shm and tcp data transfer modes
[0] MPI startup (): Rank Pid Node name Pin cpu
[0] MPI startup (): 0 48134 phi-test {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47}
[0] MPI startup (): 1 12476 phi-test-mic1 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54, 55,56
, 57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81 , 82,8
3,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,10
7,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,12
7,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,14
7,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,16
7,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,18
7,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,20
7,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,22
7}
[0] MPI startup (): I_MPI_DEBUG = 5
[0] MPI startup (): I_MPI_FABRICS = shm: tcp
[0] MPI startup (): I_MPI_INFO_NUMA_NODE_DIST = 10,21,21,10
[0] MPI startup (): I_MPI_INFO_NUMA_NODE_MAP = mlx4_0: 0, mic0: 0, mic1: 0, mic2: 0, mic3: 0, mic4: 0, mic5: 0, mic6: 0, mic7: 0
[0] MPI startup (): I_MPI_INFO_NUMA_NODE_NUM = 2
[0] MPI startup (): I_MPI_MIC = 1
[0] MPI startup (): I_MPI_PIN_MAPPING = 1: 0 0
starting wrf task 0 of 2
starting wrf task 1 of 2
real 2m26.205s
user 1m6.966s
sys 1m18.145s
real 2m25.750s
user 71m35.660s
sys 1m12.200s
It works successfully .. But the slow speed.
In my opinion, symmetric mode would be better if fast?
The problem is that I use a tcp?
This problem infiniband am wrong?
Or are my options wrong ??
Thans for your effort and kindness.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Choi,
Regarding to the DAPL+SCIF issue could you please specify your OS, MPSS, OFED versions?
Also please provide the output of 'ibv_devices/ibv_devinfo' utilities and /etc/dat.conf content.
And please check that ofed-mic service is running.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem R.
System Info
HOST OS : Linux
OS Version : 2.6.32-358.el6.x86_64
Driver Version : 3.5.1-1
MPSS Version : 3.5.1
OFED : 1.5.3
[root@phi-test /]# ./usr/bin/ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.32.5100
node_guid: 0cc4:7aff:ff5f:2228
sys_image_guid: 0cc4:7aff:ff5f:222b
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: SM_2301000001000
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
And I'll add this also.
ofed_info, dat.conf
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Choi,
According to the provided ibv_devinfo output there's missed 'scif0' device.
Please check that openibd/ofed-mic services are running.
You mentioned OFED 1.5.3 but according to ofed_info there's OFED 3.5-2-MIC installed - please ensure that OFED was installed according to the Intel MPSS User's Guide.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem R.
Oops, I'm sorry.
Come to check again OFED Version 3.5.2 is a right.
I'm sorry, but I do not know what ofed-mic is working.
How can I check?
And I think I install a properly OFED 3.5.2-MIC.
I'm sorry continued since you ask.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can start openibd/ofed-mic services with 'service <service_name> start' command (root or sudo permissions are required).
To check service status you can use 'service <service_name> status'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem R.
I tried. service command.
[root@phi-test test1]# service ofed-mic status
Status of OFED Stack:
host [ OK ]
mic0 Password:
[ OK ]
mic1 Password:
[ OK ]
mic2 Password:
[ OK ]
mic3 Password:
[ OK ]
mic4 Password:
[ OK ]
mic5 Password:
[ OK ]
mic6 Password:
[ OK ]
mic7 Password:
[ OK ]
[root@phi-test test1]# service mpss status
mpss is running
[root@phi-test test1]# service mpss start
Starting Intel(R) MPSS:
mic0: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic1: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic2: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic3: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic4: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic5: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic6: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
mic7: online (mode: linux image: /usr/share/mpss/boot/bzImage-knightscorner)
To obtain the fastest speed, How can I do now ??
Thank you~!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Choi,
Could you please provide output of the following commands (some commands may require root permissions):
service openibd status
rpm -qa | grep scif
lsmod | grep scif
Also there're suspicious lines ("Password:") in the 'service ofed-mic status' output. Not sure how critical it is, but could you please check that passwordless ssh between host and MIC cards is configured for the root user (try to connect to the MIC cards from the host via ssh).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem R.
[root@phi-test x86]# service openibd status
HCA driver loaded
Configured IPoIB devices:
ib0
Currently active IPoIB devices:
ib0
The following OFED modules are loaded:
rdma_ucm
rdma_cm
ib_addr
ib_ipoib
mlx4_core
mlx4_ib
mlx4_en
mlx5_core
mlx5_ib
ib_mthca
ib_uverbs
ib_umad
ib_sa
ib_cm
ib_mad
ib_core
iw_cxgb3
iw_cxgb4
iw_nes
ib_qib
[root@phi-test x86]# rpm -qa | grep scif
mpss-sciftutorials-3.5.1-1.glibc2.12.2.x86_64
intel-mic-ofed-libibscif-1.0.0-0.x86_64
libscif-doc-3.5.1-1.glibc2.12.2.x86_64
libscif0-3.5.1-1.glibc2.12.2.x86_64
libscif-dev-3.5.1-1.glibc2.12.2.x86_64
intel-mic-ofed-libibscif-devel-1.0.0-0.x86_64
mpss-sciftutorials-doc-3.5.1-1.glibc2.12.2.x86_64
[root@phi-test x86]# lsmod | grep scif
ibscif 84477 0
ib_core 73628 18 ibscif,ibp_server,rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_nes,iw_cxgb4,iw_cxgb3,ib_qib,mlx5_ib,mlx4_ib,ib_mthca,ib_mad
mic 588847 49 ibscif,ibp_sa_server,ibp_cm_server,ibp_server,ib_qib
compat 16629 27 ibscif,ibp_sa_server,ibp_cm_server,ibp_server,rdma_ucm,rdma_cm,iw_cm,ib_addr,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_nes,iw_cxgb4,cxgb4,iw_cxgb3,cxgb3,ib_qib,mlx5_ib,mlx5_core,mlx4_en,mlx4_ib,ib_mthca,ib_mad,ib_core,mlx4_core
Thank you!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Choi,
What's about passwordless ssh between host and MIC cards for the root account?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page