Community
cancel
Showing results for 
Search instead for 
Did you mean: 
69 Views

Intel Cluster Studio problem with infiniband

Hi,

On a cluster in my university we have Intel Cluster Studio (2011 I think, i'm not the admin). The distibution is a red hat. And the OFED drivers are installed.

We have 2 problems:
- the first one is with I_MPI_FABRICS=shm:ofa we get:
[bash][-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=f95030 [0] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so: cannot open shared object file: No such file or directory [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled [/bash]
- the second one is with I_MPI_FABRICS=shm:dapl it seems to works with IMB-MPI1:
[bash][-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=11ac030 [-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=1787030 [0] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [1] DAPL startup(): trying to open default DAPL provider from dat registry: ofa-v2-mlx4_0-1 [0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1 [0] MPI startup(): dapl data transfer mode [1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1 [1] MPI startup(): dapl data transfer mode [0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000 [0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000 [1] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000 [1] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000 [/bash] then the benchmark runs.
but with our personnal programs we get:
[0] dapl fabric is not available and fallback fabric is not enabled


What can I do to understand the problem ?

Thx a lot,
best regards
Guillaume

0 Kudos
13 Replies
James_T_Intel
Moderator
69 Views

Hi Guillaume,

The version of Intel Cluster Studio is less important than the versions of the individual components. Please send me the output from the following commands:

[bash]mpirun -V icc -V env | grep I_MPI[/bash]

For the first problem, check that libibverbs.so is available andcorrectly linkedon each of the nodes. It should be a symlink to libibverbs.so.1.0.0, if not, you should reinstall OFED.

For the second problem, I'll need some more detail. Is IMB-MPI1 the only program that works with DAPL? What if you recompile the benchmark, will the newly compiled version run? What are the contents of your /etc/dat.conf file?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Studio
69 Views

Hi,

[bash][13:05:28] denayer@frontend ~ $ mpirun -V Intel MPI Library for Linux Version 4.0 Update 2 Build 20110330 Platform Intel 64 64-bit applications Copyright (C) 2003-2011 Intel Corporation. All rights reserved [/bash]
[bash][13:06:17] denayer@frontend ~ $ icc -V Intel C Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. [/bash] [bash][13:06:19] denayer@frontend ~ $ env | grep I_MPI I_MPI_PIN=0 I_MPI_F77=ifort I_MPI_FABRICS=shm:dapl I_MPI_PATH=/appl/intel/impi/4.0.3.008 I_MPI_TUNER_DATA_DIR=/appl/intel/impi/4.0.3.008/etc64/ I_MPI_F90=ifort I_MPI_CC=icc I_MPI_CXX=icpc I_MPI_MPD_RSH=ssh I_MPI_FC=ifort I_MPI_ROOT=/appl/intel/impi/4.0.3.008 [/bash]
I don't have any libibverbs.so...Just:
/usr/lib64/libibverbs.so.1
/usr/lib64/libibverbs.so.1.0.0
These files come with the package libibverbs-1.1.4-2.el6.x86_64.

Momentan it is the only one. We have 4 in-house programs (which works without problem on others clusters with intel mpi). These 4 programs do not work on the present cluster.

There is no /etc/dat.conf...I found one unter /etc/rdma/dat.conf:
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""
ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""


Ths a lot,

best regards,
Guillaume
69 Views

Hi,

I have created the link "libibverbs.so -> libibverbs.so.1.0.0" per hand. And the problem with ofa is disappeared. Strange...the official red hat packages do not create this link.

perhaps it is the same problem with dapl. Which dapl library does intel mpi search ?

should we install compat-dapl (interface dapl 1.2) package ?

Thx for your tip.
Best regards
Guillaume
libibverbs.so -> libibverbs.so.1.0.0
James_T_Intel
Moderator
69 Views

Hi Guillame,

First, there is an odd discrepancy in your MPI versions. The I_MPI_ROOT shows that you should be running 4.0Update 3, but mpirun claims to be 4.0Update 2. That shouldn't be the cause of any of these problems, but let's try to get that straightened out. What do you get from running

[bash]which mpirun which icc[/bash]
My guess is that you're getting the mpirun from a different location than I_MPI_ROOT. To correct this,make sure /appl/intel/impi/4.0.3.008/bin64/mpivars.sh is sourced after any other scripts that would add an MPI implementation to your path. You might want to logout and login again, just to clear out any environment variables that could be causing a problem.

It appears that you've solved the problem from the OFA fabric. As long as the missing symlink is the only problem, you should be all set there. If other problems arise, I would recommend reinstalling OFED.

Now, for the DAPL fabric. Please try compiling and running the test program (pick one of the files in /appl/intel/impi/4.0.3.008/test/) with I_MPI_DEBUG=5. Try running with a different provider (I_MPI_DAPL_PROVIDER=ofa-v2-ib0 as an example). It is possible (though unlikely) that the dat.conf file is not being found by programs other than the benchmark. Try setting DAT_OVERRIDE=/etc/rdma/dat.conf and see if that helps. Or you could trycreating a symlink /etc/dat.conf -> /etc/rdma/dat.conf instead.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
69 Views

Hi James,

[bash][17:00:42] denayer@frontend ~ $ which mpirun /appl/intel/composer_xe_2011_sp1.6.233/mpirt/bin/intel64/mpirun [/bash] [bash][17:00:45] denayer@frontend ~ $ which icc /appl/intel/composer_xe_2011_sp1.6.233/bin/intel64/icc [/bash]
The problem is: i'm not the admin or the guy who has installed intel mpi. I'm the one, who wants to use intel mpi :) So I do not know exactly, what the admin did...

Should I install compat-dapl ?

Thx for your help.

Guillaume
James_T_Intel
Moderator
69 Views

Hi Guillaume,

You should not need the compat-dapl package. Try changing your .bash_login to have

[bash]. /appl/intel/impi.4.0.3.008/intel64/bin/mpivars.sh[/bash]
after any references to compilervars.sh and that should correct the version mismatch.

Have you tried the test programs?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
69 Views

Hi,

I have tested with ./appl/intel/impi/4.0.3.008/intel64/bin/mpivars.sh, but the results of env | grep I_MPI are the same.

What is the problem with /appl/intel/impi/4.0.3.008/ and my version of mpirun ?

I do not tested the test programs...not yet :)

THx for your support!
Guillaume
69 Views

grrrrrrrr! I have understood a part of the problem! the mpirun version problem is a problem in my .bashrc...sorry. Now I get:
[17:40:19] denayer@frontend ~ $ mpirun -V
Intel MPI Library for Linux* OS, Version 4.0 Update 3 Build 20110824
Copyright (C) 2003-2011, Intel Corporation. All rights reserved.

James_T_Intel
Moderator
69 Views

Hi Guillame,

That should avoid any issues with the versions being different. Let me know once you've tried the test programs and we'll go from there.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
69 Views

It was a little bit more difficult. our administrator had created a script under /etc/profile.d/intel.sh with:
. /appl/intel/bin/compilervars.sh intel64

Why is this line not correct ? is it deprecated ?

Thx a lot,
Best regards,

James_T_Intel
Moderator
69 Views

Hi Guillame,

That line should work just fine. It sets up the paths for the compilers libraries. However, this does not set up the correct path for MPI development. It uses a slightlyolder MPI version (4.0.2 instead of 4.0.3). The mpivars.sh script sets up the paths for the current MPI version (assuming you use the one for that version, which you are doing), and includes all of the development libraries, rather than just the runtime libraries. You will want to run both of these scripts, but make sure the mpivars.sh script is run after the compilervars.sh script.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
James_T_Intel
Moderator
69 Views

Hi Guillame,

I need to make a correction to my last statement. Between these two, it should not matter which is run first, as the compilervars.sh script checks for I_MPI_ROOT, and if this variable is set (mpivars.sh sets it), then it will use that to set the paths. If not, then it will use the runtime version by default. So as long as you run the mpivars.sh script, there should be no problem at all, as long as it is not overwritten later by something else.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools
69 Views

ok. Thx a lot. THe both dapl and ofa problems seem solved!
Reply