Community
cancel
Showing results for 
Search instead for 
Did you mean: 
tjahns
Beginner
105 Views

Intel MPI fails to load libmpi_lustre.so

Hello,

under which circumstances might we see the following error:

[3] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory
[2] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory

This is from a reduced test case where only ranks 2 and 3 out of 0-3 open a file with MPI_File_open. At this point the above message is printed and the job is aborted. We run on RHEL6 x86_64.

When tracing the executable with strace, I can see that it tries to load libmpi_lustre.so from various directories, but not the one that Intel MPI is installed to, which is also part of the executable's RPATH:

$  ../libtool --mode=execute objdump -x pio_write | grep RPATH
  RPATH                /sw/rhel6-x64/intel/intel-14.0.3/lib/intel64:/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib:/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib:/sw/rhel6-x64/netcdf/parallel_netcdf-1.6.0-impi-intel14/lib:/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib:/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib:/home/dkrz/k202069/opt/cdi-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib:/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib:/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib:/opt/intel/mpi-rt/4.1

strace excerpt:

open("/home/dkrz/k202069/Documents/work/dkrz/build/cdi-x64-linux-intel-impi/src/.libs/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=54243, ...}) = 0
mmap(NULL, 54243, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f392e366000
close(3)                                = 0
open("/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory)
open("/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0
open("/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/x86_64", 0x7fffb78e3e10)   = -1 ENOENT (No such file or directory)
open("/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64", {st_mode=S_IFDIR|0555, st_size=12288, ...}) = 0
open("/usr/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory)
open("/usr/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0
open("/usr/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory)
open("/usr/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
munmap(0x7f392e366000, 54243)           = 0
write(2, "[3] ERROR - ADIO_Init(): ", 25) = 25

So one can see that

/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib

is not in the list of directories tried, but everything seems to be in place there:

  ls -l /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so*
lrwxrwxrwx 1 someuser somegroup    20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so -> libmpi_lustre.so.4.1
lrwxrwxrwx 1 someuser somegroup    20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.0 -> libmpi_lustre.so.4.1
-rwxrwxr-x 1 someuser somegroup 52279 2014-03-03 09:51:58 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.1

So my question is: how can I make Intel MPI try the correct place to load libmpi_lustre.so from?

Regards, Thomas

0 Kudos
2 Replies
Gergana_S_Intel
Employee
105 Views

Hi Thomas,

It's very likely your LD_LIBRARY_PATH settings omit the full Intel MPI directory which is why you're seeing this issues.  There are a couple of things you can do but our recommendation is to simply source the provide mpivars.sh script in the /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/bin directory.  Alternatively, you can manually add the /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib to your LD_LIBRARY_PATH env variable settings.

Let me know if you try one of these and how it works out.

Regards,
~Gergana

tjahns
Beginner
105 Views

Hello,

thanks for pointing out LD_LIBRARY_PATH. We don't regularly set because it's a big no-no for production software at our site. But libtool did set LD_LIBRARY_PATH for me and since it seems dlopen only searches the paths in LD_LIBRARY_PATH and ignores DT_RUNPATH. This is a documented bug it seems:

http://www.spinics.net/lists/linux-man/msg02291.html
https://www.sourceware.org/ml/libc-hacker/2002-10/msg00048.html
https://sourceware.org/ml/libc-hacker/2002-11/msg00011.html

which libc maintainers don't want to do anything about because they seem to think users of dlopen should rather discover the right path themselves.

Based on this I developed a work-around where I add -lmpi_lustre to LIBS which is okay for our site since we have no other parallel file-system.

Regards,
Thomas

Reply