Community
cancel
Showing results for 
Search instead for 
Did you mean: 
afylot1
Beginner
480 Views

segmentation fault only with mkl on intel cluster, MPI parallelization

Jump to solution

Hi,

I am trying to run a mpi-application on a linux cluster Intel Xeon; it has been compiled with the intel fortran and c compilers(version 10.1) and mkl collection(version 10.0).

I don't get any error at compile time but I do get the following run-time error:

------------------------------------------------------------------------------------------
Parallel environment (un)loaded (OpenMPI+Intel)

[n017:03307] *** Process received signal ***
[n017:03307] Signal: Segmentation fault (11)
[n017:03307] Signal code: Address not mapped (1)
[n017:03307] Failing at address: 0x1
[n017:03307] [ 0] /lib64/libpthread.so.0 [0x2b2d247fc7c0]
[n017:03307] [ 1] /opt/intel/mkl/10.0.5.025/lib/em64t/libmkl_lapack.so(mkl_lapack_dlarre+0xc3) [0x2aaab40d0247]
[n017:03307] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3307 on node n017 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

This error doesn't show up immediately, but after some computations have been successfully(?) performed.

It looks strange that in the error report there is a reference to libmkl_lapack.so, because I link just with
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core
in fact I don't need lapack in my application, but only blas. I am linking to mkl for em64t architecture, and I am using sequential mkl because the application is parallelized with mpi and I don't want the threaded mkl to interfer with it. I have also tried to link to
-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
with variable OMP_NUM_THREADS=1 in the makefile, but it doesn't work either.

The full linking line I am using in my application is

-lgsl -lgslcblas -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lguide -lpthread
plus some user defined libraries.

By the way the same application compiled with intel compilers and the acml(non-threaded) library works on the same cluster without giving a segmentation fault.

I don't understand what I could have done wrong. Does anybody have any hint? Any help is greatly appreciated.

0 Kudos

Accepted Solutions
Gennady_F_Intel
Moderator
480 Views
Hi Afylot,

Are you running mpi based application with MKL, but I cannot see any *.libraries which are supported MPI?
if you don't need MPI -based functionality from MKL ( no CFFT or Scalapack) then, link
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lguide -lpthread.
if the probelm will be there - may you update the MKL? The version you uses is very aged, and many many problems were fixed since this version ( 10.1, 10.2 and the latest 10.3 beta release 1 month ago).
--Gennady

View solution in original post

5 Replies
Aubrey_W_
New Contributor I
480 Views

Hello,

I'm moving this to the Intel Clusters and HPC Technology forum so that the Intel Software Development Products team can help.

==
Aubrey W.
Intel Software Network Support

Gennady_F_Intel
Moderator
481 Views
Hi Afylot,

Are you running mpi based application with MKL, but I cannot see any *.libraries which are supported MPI?
if you don't need MPI -based functionality from MKL ( no CFFT or Scalapack) then, link
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lguide -lpthread.
if the probelm will be there - may you update the MKL? The version you uses is very aged, and many many problems were fixed since this version ( 10.1, 10.2 and the latest 10.3 beta release 1 month ago).
--Gennady

View solution in original post

Andres_M_Intel4
Employee
480 Views
Also, I've found useful the following link, it shows you how the linking should be done under different scenarios. I would give it a try.
http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
-- Andres
480 Views
I see this is an MKL issue, so I've moved it again.

==
Aubrey W.
Intel Software Network Support
afylot1
Beginner
480 Views
I changed to mkl v10.2 and compiler v10.1, I don't have such problem anymore.

@Andreas
It seems very useful, I think I am going to use it a lot.

Thanks