- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm having problems with linking GSL libraries to my code when running in parallel. It seems to me that the LD_LIBRARY_PATH is set up correctly, but there is still a run-time error:
./a.out : error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory
I run the code using qsub, with the following command in the script
mpirun -machinefile $PBS_NODEFILE -np $NPROCS ./a.out > logfile
where NPROCS=`wc -l < $PBS_NODEFILE` .
The LD_LIBRARY_PATH is:
% echo $LD_LIBRARY_PATH
/opt/intel/fce/10.1.022/lib/:/usr/local/lib
and the libraries seem to be linked
% ldd ./a.out
libgsl.so.0 => /usr/local/lib/libgsl.so.0 (0x00002b891a383000)
libgslcblas.so.0 => /usr/local/lib/libgslcblas.so.0 (0x00002b891a66a000)
libmpichf90.so.2 => /usr/lib64/libmpichf90.so.2 (0x00002b891a7d0000)
libmpich.so.2 => /usr/lib64/libmpich.so.2 (0x00002b891a936000)
libm.so.6 => /lib64/libm.so.6 (0x00002b891aae4000)
libc.so.6 => /lib64/libc.so.6 (0x00002b891ac39000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b891ae69000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b891af77000)
libmpi_infinipath.so.1 => /usr/lib64/libmpi_infinipath.so.1 (0x00002b891b07b000)
/lib64/ld-linux-x86-64.so.2 (0x00002b891a266000)
librt.so.1 => /lib64/librt.so.1 (0x00002b891b18d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b891b297000)
libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00002b891b3ad000)
libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1 (0x00002b891b4b6000)
If I run similar (but serial) codes which also call gsl libraries, they work as expected.
Any idea why this happens with the parallel code? Do i need to include some additional libraries for parallel linking?
Thanks,
Grgur
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please check if you reallyhave the 64-bit version of libgsl installedusing command"file /usr/local/lib/libgsl.so.0". If this is a 32-bit version that explains your problem. I assume your main program is 64-bit.
regards
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
Thanks so much for the reply. Indeed, it seems that shared libraries are not set up on the nodes. LD_LIBRARY_PATH is undefined
node12 > echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH: Undefined variable.
and ldd doesn't find the libraries
node12 > ldd ./a.out
libgsl.so.0 => not found
libgslcblas.so.0 => not found
libmpichf90.so.2 => /usr/lib64/libmpichf90.so.2 (0x00002b63a9da1000)
libmpich.so.2 => /usr/lib64/libmpich.so.2 (0x00002b63a9f08000)
libm.so.6 => /lib64/libm.so.6 (0x00002b63aa0b5000)
libc.so.6 => /lib64/libc.so.6 (0x00002b63aa20a000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b63aa43b000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b63aa548000)
libmpi_infinipath.so.1 => /usr/lib64/libmpi_infinipath.so.1 (0x00002b63aa64c000)
/lib64/ld-linux-x86-64.so.2 (0x00002b63a9c84000)
librt.so.1 => /lib64/librt.so.1 (0x00002b63aa75f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b63aa868000)
libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00002b63aa97e000)
libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1 (0x00002b63aaa88000)
Do you know how to set it properly on the nodes as well?
Best,
Grgur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael,
Thanks for the reply. It seems that the library is 64-bit. file /usr/local/lib/libgsl.so.0.14.0 gives:
/usr/local/lib/libgsl.so.0.14.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped
I actually use the same compiler to compile both serial and parallel code, it's just that serial doesn't use MPI libraries. Any ideas?
Best,
Greg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greg,
I don't fully understand. Does libgsl exist in /usr/local/lib on all the cluster nodes?
If not, compile with -static
If so, then your LD_LIBRARY_PATH is not being propagated to the child processes after an mpirun command. To fix this, add /usr/local/lib to your LD_LIBRARY_PATH in your .bashrc - assuming that your home dir is NFS mounted to all the cluster nodes as it should be.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A possibility is to install the shared libraries on an nfs mount visible on each node. Otherwise, it may be necessary to copy them to the file system of each node.
Supposing that you have set up MPI to log into each node by ssh, your .login or .profile on each node may be made to add the necessary LD_LIBRARY_PATH settings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ronald, Tim,
You are right, /usr/local/lib is not visible from the nodes and the local /usr/local/lib is empty on the nodes. I have compiled the serial version of the code with the -static flag and in that case the code runs both from the master and from the nodes.
However, when i try to link the parallel code with the -static flag, I get the following error:
ld: cannot find -lmpichf90
I have tried adding -L/usr/lib64/ when linking, but i get the same error (required libraries and locations are listed in my first post).
Any ideas how to fix this?
Thanks for all the help,
Greg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't remember whether Ron posted advice on building mpich2 with ifort, or whether there were options when building mpich2 to choose between static and dynamic libraries. If you built the mpich2 correctly with static libraries enabled, using the mpif77 or mpif90 wrapper for the link would find them.
It's preferable to configure with a specific --prefix when configuring to build mpi yourself, as various difficulties may ensue when you install mpi on one of the standard default paths. e.g. configure --prefix=/usr/local/mpich2ifort/ so as to have a scheme to keep multiple mpi versions separated.
It's entirely normal to link dynamic libraries for your MPI (and your linux libraries), as you can't run anyway without the MPI installation visible on all nodes, even though using static link for libraries you provide, or those which come with ifort. You would link as you did for the dynamic library build, but specify -static-intel if you want the ifort static libraries, and specify the specific .a files you are providing by full path name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
I've heard another good advice, tried it, and it worked. I recompliled gsl library with disabled sharing option. That way only static libraries are present in /usr/local/lib and i can compile my code without the -static flag. MPI libraries are linked dynamically and the parallel code executes normally. I think it's quite a nice trick.
Greg
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page