Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

gsl with mpi ifort

gtokic
Beginner
2,231 Views

Hi,

I'm having problems with linking GSL libraries to my code when running in parallel. It seems to me that the LD_LIBRARY_PATH is set up correctly, but there is still a run-time error:

./a.out : error while loading shared libraries: libgsl.so.0: cannot open shared object file: No such file or directory

I run the code using qsub, with the following command in the script

mpirun -machinefile $PBS_NODEFILE -np $NPROCS ./a.out > logfile

where NPROCS=`wc -l < $PBS_NODEFILE` .

The LD_LIBRARY_PATH is:

% echo $LD_LIBRARY_PATH
/opt/intel/fce/10.1.022/lib/:/usr/local/lib

and the libraries seem to be linked

% ldd ./a.out
libgsl.so.0 => /usr/local/lib/libgsl.so.0 (0x00002b891a383000)
libgslcblas.so.0 => /usr/local/lib/libgslcblas.so.0 (0x00002b891a66a000)
libmpichf90.so.2 => /usr/lib64/libmpichf90.so.2 (0x00002b891a7d0000)
libmpich.so.2 => /usr/lib64/libmpich.so.2 (0x00002b891a936000)
libm.so.6 => /lib64/libm.so.6 (0x00002b891aae4000)
libc.so.6 => /lib64/libc.so.6 (0x00002b891ac39000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b891ae69000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b891af77000)
libmpi_infinipath.so.1 => /usr/lib64/libmpi_infinipath.so.1 (0x00002b891b07b000)
/lib64/ld-linux-x86-64.so.2 (0x00002b891a266000)
librt.so.1 => /lib64/librt.so.1 (0x00002b891b18d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b891b297000)
libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00002b891b3ad000)
libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1 (0x00002b891b4b6000)

If I run similar (but serial) codes which also call gsl libraries, they work as expected.

Any idea why this happens with the parallel code? Do i need to include some additional libraries for parallel linking?

Thanks,

Grgur

0 Kudos
9 Replies
TimP
Honored Contributor III
2,231 Views
The shared libraries must be found on LD_LIBRARY_PATH on each node of the cluster. If you're using ldd for your checking, you must do it on each node, with the environment set the way you do it for your cluster run. I think you've demonstrated it's not a compiler question.
0 Kudos
mriedman
Novice
2,232 Views

please check if you reallyhave the 64-bit version of libgsl installedusing command"file /usr/local/lib/libgsl.so.0". If this is a 32-bit version that explains your problem. I assume your main program is 64-bit.

regards

Michael

0 Kudos
gtokic
Beginner
2,231 Views

Hi Tim,

Thanks so much for the reply. Indeed, it seems that shared libraries are not set up on the nodes. LD_LIBRARY_PATH is undefined

node12 > echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH: Undefined variable.

and ldd doesn't find the libraries

node12 > ldd ./a.out
libgsl.so.0 => not found
libgslcblas.so.0 => not found
libmpichf90.so.2 => /usr/lib64/libmpichf90.so.2 (0x00002b63a9da1000)
libmpich.so.2 => /usr/lib64/libmpich.so.2 (0x00002b63a9f08000)
libm.so.6 => /lib64/libm.so.6 (0x00002b63aa0b5000)
libc.so.6 => /lib64/libc.so.6 (0x00002b63aa20a000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b63aa43b000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b63aa548000)
libmpi_infinipath.so.1 => /usr/lib64/libmpi_infinipath.so.1 (0x00002b63aa64c000)
/lib64/ld-linux-x86-64.so.2 (0x00002b63a9c84000)
librt.so.1 => /lib64/librt.so.1 (0x00002b63aa75f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b63aa868000)
libinfinipath.so.4 => /usr/lib64/libinfinipath.so.4 (0x00002b63aa97e000)
libpsm_infinipath.so.1 => /usr/lib64/libpsm_infinipath.so.1 (0x00002b63aaa88000)

Do you know how to set it properly on the nodes as well?

Best,

Grgur

0 Kudos
gtokic
Beginner
2,231 Views

Hi Michael,

Thanks for the reply. It seems that the library is 64-bit. file /usr/local/lib/libgsl.so.0.14.0 gives:
/usr/local/lib/libgsl.so.0.14.0: ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV), not stripped

I actually use the same compiler to compile both serial and parallel code, it's just that serial doesn't use MPI libraries. Any ideas?

Best,

Greg

0 Kudos
Ron_Green
Moderator
2,231 Views

Greg,

I don't fully understand. Does libgsl exist in /usr/local/lib on all the cluster nodes?

If not, compile with -static

If so, then your LD_LIBRARY_PATH is not being propagated to the child processes after an mpirun command. To fix this, add /usr/local/lib to your LD_LIBRARY_PATH in your .bashrc - assuming that your home dir is NFS mounted to all the cluster nodes as it should be.

ron

0 Kudos
TimP
Honored Contributor III
2,231 Views

A possibility is to install the shared libraries on an nfs mount visible on each node. Otherwise, it may be necessary to copy them to the file system of each node.

Supposing that you have set up MPI to log into each node by ssh, your .login or .profile on each node may be made to add the necessary LD_LIBRARY_PATH settings.

0 Kudos
gtokic
Beginner
2,231 Views

Hi Ronald, Tim,

You are right, /usr/local/lib is not visible from the nodes and the local /usr/local/lib is empty on the nodes. I have compiled the serial version of the code with the -static flag and in that case the code runs both from the master and from the nodes.

However, when i try to link the parallel code with the -static flag, I get the following error:

ld: cannot find -lmpichf90

I have tried adding -L/usr/lib64/ when linking, but i get the same error (required libraries and locations are listed in my first post).

Any ideas how to fix this?

Thanks for all the help,

Greg

0 Kudos
TimP
Honored Contributor III
2,232 Views

I don't remember whether Ron posted advice on building mpich2 with ifort, or whether there were options when building mpich2 to choose between static and dynamic libraries. If you built the mpich2 correctly with static libraries enabled, using the mpif77 or mpif90 wrapper for the link would find them.

It's preferable to configure with a specific --prefix when configuring to build mpi yourself, as various difficulties may ensue when you install mpi on one of the standard default paths. e.g. configure --prefix=/usr/local/mpich2ifort/ so as to have a scheme to keep multiple mpi versions separated.

It's entirely normal to link dynamic libraries for your MPI (and your linux libraries), as you can't run anyway without the MPI installation visible on all nodes, even though using static link for libraries you provide, or those which come with ifort. You would link as you did for the dynamic library build, but specify -static-intel if you want the ifort static libraries, and specify the specific .a files you are providing by full path name.

0 Kudos
gtokic
Beginner
2,232 Views

Hi Tim,

I've heard another good advice, tried it, and it worked. I recompliled gsl library with disabled sharing option. That way only static libraries are present in /usr/local/lib and i can compile my code without the -static flag. MPI libraries are linked dynamically and the parallel code executes normally. I think it's quite a nice trick.

Greg
0 Kudos
Reply