- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am running Intel MPI for Intel mp_linpack benchmark (xhpl_em64t).
Steps:
1. I sourced the mpivars.sh from /opt/intel/impi/bin64/mpivars.sh
2. I did "mpdboot -f hostfile"
$ cat hostfile node 1 node 2
3. I did "mpirun -f hostfile -ppn 1 -np 2 ./xhpl_em64t"
After step 3, errors occured. Below is the error message with I_MPI_DEBUG=50
[0] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so [0] my_dlopen(): trying to dlopen: libdat.so [0] MPI startup(): cannot open dynamic library libdat.so [0] my_dlopen(): Look for library libdat.so in /opt/intel/impi/4.0.1.007/intel64/lib,/apps/GNU/GCC/4.7.0/lib64,/apps/GNU/GCC/4.7.0/lib,/apps/GNU/MPC/1.0.1/lib,/apps/GNU/GMP/5.1.2/lib,/apps/GNU/MPFR/3.1.2/lib,include ld.so.conf.d/*.conf,,/lib,/usr/lib [0] my_dlopen(): dlopen failed: libdat.so: cannot open shared object file: No such file or directory [0] I_MPI_dlopen_dat(): could not open -ldat [cli_0]: got unexpected response to put :cmd=unparseable_msg rc=-1 : [0] MPI startup(): Intel(R) MPI Library, Version 3.1 Build 20080331 [0] MPI startup(): Copyright (C) 2003-2008 Intel Corporation. All rights reserved. [cli_0]: aborting job: Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(264): Initialization failed MPIDD_Init(98).......: channel initialization failed MPIDI_CH3_Init(183)..: generic failure with errno = 336068751 (unknown)(): Other MPI error [1] I_MPI_dlopen_dat(): trying to dlopen default -ldat: libdat.so [1] my_dlopen(): trying to dlopen: libdat.so [1] MPI startup(): cannot open dynamic library libdat.so [1] my_dlopen(): Look for library libdat.so in /opt/intel/impi/4.0.1.007/intel64/lib,/apps/GNU/GCC/4.7.0/lib64,/apps/GNU/GCC/4.7.0/lib,/apps/GNU/MPC/1.0.1/lib,/apps/GNU/GMP/5.1.2/lib,/apps/GNU/MPFR/3.1.2/lib,include ld.so.conf.d/*.conf,,/lib,/usr/lib [1] my_dlopen(): dlopen failed: libdat.so: cannot open shared object file: No such file or directory [1] I_MPI_dlopen_dat(): could not open -ldat rank 0 in job 1 fuji382_53442 caused collective abort of all ranks exit status of rank 0: return code 13
Would anyone be able to help me? Thank you very much in advance!
Thank you,
Kevin
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems like your InfiniBand* drivers are installed incorrectly. Try reinstalling the drivers and re-running.
Also, you don't need to use mpdboot with mpirun. By default, mpirun uses Hydra, not MPD.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi James,
Thank you for the reply. I have reinstalled OFED-3.12 from www.openfabrics.org. However, I still get the same problem.
Some further information on my system:
$ mpirun --version Intel(R) MPI Library for Linux, 64-bit applications, Version 4.0 Update 1 Build 20100910 Copyright (C) 2003-2010 Intel Corporation. All rights reserved. $ cat /etc/dat.conf # DAT v2.0, v1.2 configuration file # # Each entry should have the following fields: # # <ia_name> <api_version> <threadsafety> <default> <lib_path> \ # <provider_version> <ia_params> <platform_params> # # For uDAPL cma provder, <ia_params> is one of the following: # network address, network hostname, or netdev name and 0 for port # # For uDAPL scm provider, <ia_params> is device name and port # For uDAPL ucm provider, <ia_params> is device name and port # For uDAPL iWARP provider, <ia_params> is netdev device name and 0 # For uDAPL iWARP provider, <ia_params> is netdev device name and 0 # For uDAPL RoCE provider, <ia_params> is device name and 0 # ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" "" ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" "" ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "" ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" "" ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" "" ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" "" ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" "" ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" "" ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" "" ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" "" ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 1" "" ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 2" "" ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 1" "" ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 2" "" ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" "" ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" "" ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" "" ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" "" ofa-v2-mcm-1 u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 1" "" ofa-v2-mcm-2 u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 2" "" ofa-v2-scif0 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "scif0 1" "" ofa-v2-scif0-u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "scif0 1" "" ofa-v2-mic0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "mic0:ib 1" "" ofa-v2-mlx4_0-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" "" ofa-v2-mlx4_0-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" "" ofa-v2-mlx4_1-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_1 1" "" ofa-v2-mlx4_1-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_1 2" "" ofa-v2-mlx4_1-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_1 1" "" ofa-v2-mlx4_1-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_1 2" "" ofa-v2-mlx4_0-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 1" "" ofa-v2-mlx4_0-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 2" "" ofa-v2-mlx4_1-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_1 1" "" ofa-v2-mlx4_1-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_1 2" "" ofa-v2-mlx5_0-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_0 1" "" ofa-v2-mlx5_0-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_0 2" "" ofa-v2-mlx5_1-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_1 1" "" ofa-v2-mlx5_1-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_1 2" "" ofa-v2-mlx5_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_0 1" "" ofa-v2-mlx5_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_0 2" "" ofa-v2-mlx5_1-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_1 1" "" ofa-v2-mlx5_1-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_1 2" "" ofa-v2-mlx5_0-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_0 1" "" ofa-v2-mlx5_0-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_0 2" "" ofa-v2-mlx5_1-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_1 1" "" ofa-v2-mlx5_1-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_1 2" "" $ /etc/infiniband/info prefix=/usr Kernel=2.6.32-431.29.2.el6.x86_64 Configure options: --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-mlx4_en-mod --with-mlx5-mod --with-cxgb3-mod --with-cxgb4-mod --with-nes-mod --with-qib-mod --with-ocrdma-mod --with-ipoib-mod --with-srp-mod --with-nfsrdma-mod $ lsmod | grep ib ib_addr 6285 2 rdma_ucm,rdma_cm ib_ipoib 80316 0 ib_cm 36986 2 rdma_cm,ib_ipoib ib_uverbs 36126 5 rdma_ucm ib_umad 11564 4 libcrc32c 1246 1 iw_nes ipv6 318183 88 ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6,ib_addr,ib_ipoib,ocrdma,iw_cxgb4,cxgb4 ib_qib 389783 0 mlx5_ib 92954 0 mlx5_core 77814 1 mlx5_ib mlx4_ib 128242 1 ib_sa 23806 5 rdma_ucm,rdma_cm,ib_ipoib,ib_cm,mlx4_ib mlx4_core 213339 2 mlx4_en,mlx4_ib ib_mthca 134119 0 ib_mad 38676 6 ib_cm,ib_umad,ib_qib,mlx4_ib,ib_sa,ib_mthca ib_core 73994 17 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_uverbs,ib_umad,ocrdma,iw_nes,iw_cxgb4,iw_cxgb3,ib_qib,mlx5_ib,mlx4_ib,ib_sa,ib_mthca,ib_mad compat 26078 25 rdma_ucm,rdma_cm,iw_cm,ib_addr,ib_ipoib,ib_cm,ib_uverbs,ib_umad,ocrdma,be2net,iw_nes,iw_cxgb4,cxgb4,iw_cxgb3,cxgb3,ib_qib,mlx5_ib,mlx5_core,mlx4_en,mlx4_ib,ib_sa,mlx4_core,ib_mthca,ib_mad,ib_core
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the problem! It turns out that my Intel MPI somehow does not work with DAPL v2.0. I installed compatibility with DAPL v1.2 by doing yum remove the existing dapl and yum install the following:
dapl-2.0.34-1.el6.x86_64
compat-dapl-1.2.19-2.el6.x86_64
Now it works. Thank you for the help, James.
Regards,
Kevin

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page