Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Mohammad_Umair
Beginner
507 Views

Parallel HDF5 Fortran

Jump to solution

I am developing an application that requires a parallel HDF5 library build with Intel Compiler and Fortran support. I successfully build the HDF5 library on my account and was able to compile my application with it but when I run my application I get an error message: "error while loading shared libraries: libhdf5_fortran.so.102". I already provided the path to the library and bin in my .bashrc file.

# zlib
export PATH="/home/uxxxxx/libraries/zlib/zlib-1.2.11/bin:$PATH"
export LD_LIBRARY_PATH="/home/uxxxxx/zlib/zlib-1.2.11/lib:$LD_LIBRARY_PATH"

# HDF5
export PATH="/home/uxxxxx/libraries/hdf5/hdf5-1.10.5/bin:$PATH"
export LD_LIBRARY_PATH="/home/uxxxxx/hdf5/hdf5-1.10.5/lib:$LD_LIBRARY_PATH"

I don't know why it's not able to locate the necessary files.

I'm stuck at this point. Kindly help! 

Tags (1)
0 Kudos
1 Solution
AbhishekD_Intel
Moderator
507 Views

Hi Umair,

Thank you for the details and reproducer.

Firstly, I installed zlib-1.2.11 with ICC(export CC=icc) refer screenshots

Second, I installed hdf5-1.10.5 with the same env that you have provided in the comments for more details you can refer to screenshots.

Then I have given the path of the installed location of hdf5 in your make file(ARCH/make.arch.umair) and compiled it successfully without any errors you can refer screenshots for more details. After executing run.x it throws me the error of:

" ./run.x: error while loading shared libraries: libhdf5_fortran.so.102: cannot open shared object file: No such file or directory "

Then I tried exporting LD_LIBRARY_PATH with the installed location of libs and tried executing the run.x and now it's running you can refer the screenshots for more details. So I believe if you follow the same steps that I have followed then you can solve this problem.

I have also attached the o/p screenshot you can also check it out.

 

In your qsub command "qsub -l nodes=4:batch:ppn=24,walltime=24:00:00 -d . run.sh" ppn should be 2 only.

I hope this will solve your problem.

Warm Regards,

Abhishek

     

View solution in original post

10 Replies
AbhishekD_Intel
Moderator
507 Views

Hi,

Will you please let us know the compiler version you are using to build the project?

As mentioned you can able to build the library and also have successfully generated the executable. This means your linker has got the location of the lib file, but the loader is unable to detect the .so file, though you have provided the path to the lib.

So, in this case, try using the following command:

             ldd <executable_name>

It will help you to check whether your executable has properly loaded the .so files, if not you again have to export the proper path until you will not get the location of yours.so file after running ldd command.

If still, you face the same issue you can reach out to us we will help you.

 

Warm Regards,

Abhishek

Mohammad_Umair
Beginner
507 Views

Hi Abhishek,

Thank you very much for your kind response. As I said earlier, I was able to build the HDF5 library successfully. Then I thought there might be an issue with the build so I deleted the library and installed it again. This time when I'm compiling my application it is not generating the executable. Right at the end of the compilation, it says : /home/uxxxxx/libraries/hdf5/hdf5-1.10.5/lib/libhdf5_fortran.so: undefined reference to `__intel_avx_memmove'.

The compiler version I used is ''ifort (IFORT) 19.0.3.199 20190206'' which is available at dev-Cloud. I build the hdf5 library with the following flags:

export CC="/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpiicc"
export CFLAGS="-O3 -xHost -ip -mcmodel=medium"
export CCP="mpiicc -E"
export FC="/glob/development-tools/versions/intel-parallel-studio/compilers_and_libraries_2019.3.199/linux/mpi/intel64/bin/mpiifort"
export FCFLAGS="-O3 -xHost -ip -mcmodel=medium"
export LIBS="-lz"
export LDFLAGS="-L/home/uxxxx/libraries/zlib/zlib-1.2.11/lib"
export CPPFLAGS="-I/home/uxxxx/libraries/zlib/zlib-1.2.11/include/"

./configure --enable-parallel --enable-fortran --with-zlib=/home/uxxxxx/libraries/zlib/zlib-1.2.11 --prefix=/home/uxxxxx/libraries/hdf5/hdf5-1.10.5/

make

make install

I skip "make check" as it takes a long time to finsh. I did it once and every test was successfull.

 After the installation, I add the following lines to my bashrc file:

# HDF5
export PATH="/home/uxxxxx/libraries/hdf5/hdf5-1.10.5/bin:$PATH"
export LD_LIBRARY_PATH="/home/uxxxxx/hdf5/hdf5-1.10.5/lib:$LD_LIBRARY_PATH"

When I type which h5pfc, I get: "/home/uxxxx/libraries/hdf5/hdf5-1.10.5/bin/h5pfc"

So, everything seems correct up to this point. As I have installed the HDF5 library on many other clusters and it works every time!

My makefile is something like this:

###############################################
#Intel compiler
###############################################
FC = mpiifort

SWITCH = -O3
FFLAGS = -qopenmp $(SWITCH) -mcmodel medium -shared-intel -qoverride-limits
DEBUGFLAGS = -traceback -fpe0 -check all -check bounds

###############################################
#Parallel HDF5
###############################################
HDF5_HOME = /home/uxxxxx/libraries/hdf5/hdf5-1.10.5
LIBHDF5 = -L$(HDF5_HOME)/lib -lhdf5_fortran -lhdf5 -lz -lm
INCLUDES = -I$(HDF5_HOME)/include
 

###############################################

$(EXECTBL): $(OBJS)

        cd $(OBJD); $(FC) $(DBGFLAGS) $(FFLAGS) -o ../$(EXECTBL)  \
                $(OBJS) $(LIBHDF5) $(INCLUDES)
 

%.o $(OBJD)/%.o: %.f90

        cd $(OBJD); $(FC) -c $(DBGFLAGS) $(FFLAGS)  $(SRCD)/$(*).f90 \
                $(LIBHDF5) $(INCLUDES)

################################################

I have only included the lines which are important.

But now the makefile is unable to generate the executable.

The error I get is: /home/uxxxxx/libraries/hdf5/hdf5-1.10.5/lib/libhdf5_fortran.so: undefined reference to `__intel_avx_memmove'.

Sorry for this huge email. I thought it was important to give every bit of necessary information. So that you can catch up on what I'm missing.

Thank you very much.

Kind Regards,

Umair

AbhishekD_Intel
Moderator
507 Views

Hi Umair,

Thanks a lot for the detailed information. It seems that you have followed the correct procedures and also have included the necessary flags and have given the proper path to your shared libraries. 

But we are quite amazed by looking into the error, Is it possible for you to send a small reproducer to us so that we can reproduce the same error and then we will try to find the relevant solution towards it.

 

Warm Regards,

Abhishek

Mohammad_Umair
Beginner
507 Views

Hi Abhishek,

Thank you very much for your prompt response. I have created another small test code that requires HDF5 library, however, it's a serial code but it shouldn't matter as I can easily compile it and run it on my own laptop. Please find the attached tar file.

I am really stuck at this point and can't get any further without this issue being resolved. I only have access to the DevCloud for just 120 days and I have wasted 2-3 days just in resolving this issue. 

One more thing I would like to ask you is that, as mentioned on the website each compute node has 24 processors, right? I was trying to test one of my applications on the cluster with 96 MPI processes. The run.sh I wrote goes as follows:

#!/bin/bash

echo "########## Executing the run"
mpiexec -np 96 ./exec.x < input.in
echo "########## Done with the run"
 

And I submitted the job using: qsub -l nodes=4:batch:ppn=24,walltime=24:00:00 -d . run.sh

It failed and I got the following message:

Compute nodes in this cabinet are configured with 2 slots per node,
but you included the argument "-l nodes=4:batch:ppn=24", which results in 24 slots per node.
With this argument, your application will not have any nodes to run on.
Please run with "-l nodes=4:batch:ppn=2,walltime=24:00:00" instead.

 

Kindly help me resolve these issues at earliest. I'm running out of time!

Kind Regards,

Umair

AbhishekD_Intel
Moderator
508 Views

Hi Umair,

Thank you for the details and reproducer.

Firstly, I installed zlib-1.2.11 with ICC(export CC=icc) refer screenshots

Second, I installed hdf5-1.10.5 with the same env that you have provided in the comments for more details you can refer to screenshots.

Then I have given the path of the installed location of hdf5 in your make file(ARCH/make.arch.umair) and compiled it successfully without any errors you can refer screenshots for more details. After executing run.x it throws me the error of:

" ./run.x: error while loading shared libraries: libhdf5_fortran.so.102: cannot open shared object file: No such file or directory "

Then I tried exporting LD_LIBRARY_PATH with the installed location of libs and tried executing the run.x and now it's running you can refer the screenshots for more details. So I believe if you follow the same steps that I have followed then you can solve this problem.

I have also attached the o/p screenshot you can also check it out.

 

In your qsub command "qsub -l nodes=4:batch:ppn=24,walltime=24:00:00 -d . run.sh" ppn should be 2 only.

I hope this will solve your problem.

Warm Regards,

Abhishek

     

View solution in original post

Mohammad_Umair
Beginner
507 Views

Dear Abhishek,

The solution provided by you really worked.

Thank you so much for your help and prompt response. 

Can you please also tell me how can I run my application on multiple nodes, for example, if I want to run my application with 96 MPI processes and since each node on DevCloud has 24 processors, so I will need 4 nodes, right? And my run.sh script should be: mpiexec -np 96 ./exec.x

and it should be submitted using: qsub -l nodes=4:batch:ppn=2 -d . run.sh, right? I still don't understand why ppn should be kept at 2. Shouldn't it be 24?

 

Thanks and Regards,

Umair

AbhishekD_Intel
Moderator
507 Views

Hi Umair,

It is good to know that the resolution worked for you.

You can directly run your application by selecting the 4 nodes with command qsub -I -l nodes=4:ppn=2, here ppn is not the same as we use while mpirun. This ppn is configured according to the scheduler so it must be 2 and will give you access to all processors of a node you have selected.

 

Warm Regards,

Abhishek

 

 

AbhishekD_Intel
Moderator
507 Views

Hi,

Please update us if your got resolved.

You can always post a new thread if you have any problem.

 

Thank You

Mohammad_Umair
Beginner
507 Views

Hi Abhishek,

Yes, the solution provided by you worked. Thank you so much.

Regards,

Umair

AbhishekD_Intel
Moderator
507 Views

Hi Umair,

Thank you for your response we are closing this thread.

You can post a new thread if you have any problems.

Warm Regards,

Abhishek

 

Reply