- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
under which circumstances might we see the following error:
[3] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory [2] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory
This is from a reduced test case where only ranks 2 and 3 out of 0-3 open a file with MPI_File_open. At this point the above message is printed and the job is aborted. We run on RHEL6 x86_64.
When tracing the executable with strace, I can see that it tries to load libmpi_lustre.so from various directories, but not the one that Intel MPI is installed to, which is also part of the executable's RPATH:
$ ../libtool --mode=execute objdump -x pio_write | grep RPATH RPATH /sw/rhel6-x64/intel/intel-14.0.3/lib/intel64:/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib:/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib:/sw/rhel6-x64/netcdf/parallel_netcdf-1.6.0-impi-intel14/lib:/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib:/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib:/home/dkrz/k202069/opt/cdi-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib:/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib:/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib:/opt/intel/mpi-rt/4.1
strace excerpt:
open("/home/dkrz/k202069/Documents/work/dkrz/build/cdi-x64-linux-intel-impi/src/.libs/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=54243, ...}) = 0 mmap(NULL, 54243, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f392e366000 close(3) = 0 open("/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 open("/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64", {st_mode=S_IFDIR|0555, st_size=12288, ...}) = 0 open("/usr/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/usr/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 open("/usr/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/usr/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0 munmap(0x7f392e366000, 54243) = 0 write(2, "[3] ERROR - ADIO_Init(): ", 25) = 25
So one can see that
/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib
is not in the list of directories tried, but everything seems to be in place there:
ls -l /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so* lrwxrwxrwx 1 someuser somegroup 20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so -> libmpi_lustre.so.4.1 lrwxrwxrwx 1 someuser somegroup 20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.0 -> libmpi_lustre.so.4.1 -rwxrwxr-x 1 someuser somegroup 52279 2014-03-03 09:51:58 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.1
So my question is: how can I make Intel MPI try the correct place to load libmpi_lustre.so from?
Regards, Thomas
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas,
It's very likely your LD_LIBRARY_PATH settings omit the full Intel MPI directory which is why you're seeing this issues. There are a couple of things you can do but our recommendation is to simply source the provide mpivars.
Let me know if you try one of these and how it works out.
Regards,
~Gergana
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
thanks for pointing out LD_LIBRARY_PATH. We don't regularly set because it's a big no-no for production software at our site. But libtool did set LD_LIBRARY_PATH for me and since it seems dlopen only searches the paths in LD_LIBRARY_PATH and ignores DT_RUNPATH. This is a documented bug it seems:
http://www.spinics.net/lists/linux-man/msg02291.html
https://www.sourceware.org/ml/libc-hacker/2002-10/msg00048.html
https://sourceware.org/ml/libc-hacker/2002-11/msg00011.html
which libc maintainers don't want to do anything about because they seem to think users of dlopen should rather discover the right path themselves.
Based on this I developed a work-around where I add -lmpi_lustre to LIBS which is okay for our site since we have no other parallel file-system.
Regards,
Thomas
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page