Dear TimP,

Alberto_F__M_ · ‎03-06-2013

Dear forum users,

my MPI codes are linked with Intel MKL, and intensively use PARDISO sparse direct solver. These MPI codes are actually MPMD in the sense that the last MPI task in the communicator does not execute the same code as the rest. This last task is devoted to a quite time/memory hungry computation using Intel MKL PARDISO (in mult-threaded mode). As a consquence, our MPI codes need a quite particular placement of MPI processes/threads to nodes/cores on the underlying distributed-memory computer (with 16 cores per node). In particular, assuming 16*N + 1 MPI tasks are spawn, the first 16*N tasks have to be mapped to N nodes with 16 tasks per node, and one thread per process, and the last remaining task to one dedicated node with one MPI task per node and 16 threads per process, so that the last task should have access to all the memory/cores within a node. This particular mapping is achieved with the help of a OpenMPI hostfile+rankfile, and a wrapper launch script that controls MKL_NUM_THREADS depending on the MPI task identifier.

When executing the parallel codes with 16001 MPI tasks under the aforehementioned mapping, a segmentation fault was produced within MPI task id 16000:

[1,16000]<stderr>:forrtl: severe (174): SIGSEGV, segmentation fault occurred

[1,16000]<stderr>:Image PC Routine Line Source

[1,16000]<stderr>:libmpi.so.1 00002B547BF5CA8A Unknown Unknown Unknown

[1,16000]<stderr>:libmpi.so.1 00002B547BF5D917 Unknown Unknown Unknown

[1,16000]<stderr>:libmkl_core.so 00002B547AAA1FD1 Unknown Unknown Unknown

[1,16000]<stderr>:libmkl_core.so 00002B547B520484 Unknown Unknown Unknown

[1,16000]<stderr>:libmkl_core.so 00002B547B53A0A7 Unknown Unknown Unknown

[1,16000]<stderr>:libmkl_core.so 00002B547B40E144 Unknown Unknown Unknown

[1,16000]<stderr>:libmkl_core.so 00002B547B40E27E Unknown Unknown Unknown

[1,16000]<stderr>:libmkl_intel_lp64 00002B5479531596 Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 00000000006FD95A Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 00000000006FBD4A Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 00000000006E02EE Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 00000000005C514B Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 0000000000575777 Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 000000000043E61C Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 000000000043B1FC Unknown Unknown Unknown

[1,16000]<stderr>:libc.so.6 0000003B77E1ECDD Unknown Unknown Unknown

[1,16000]<stderr>:par_test_dd_metho 000000000043B0F9 Unknown Unknown Unknown

MPI Task Id. 16000 was allocated one entire node (16 cores + 64 Gbytes) . I have thoroughly tried to find the cause of this segmentation fault without success yet. I know that tt is produced during the sparse direct factorization of a matrix within Intel MKL (PARDISO). I could extract the matrix from the parallel program into a file, and factorize it in isolation with a sequential program linked against Intel MKL (PARDISO), and the program consumed 5.1 GBytes, no segmentation fault was produced, so that a bug in INTEL MKL (PARDISO) codes can be discarded. It should be something related with the parallel environment. I guess that some kind of limit is being exceeded (e.g. stack size?), but I can not confirm it. stack size is unlimited (i.e., ulimit -s unlimit). I have also tried OMP_STACKSIZE=32M, and MKL_DYNAMIC=FALSE without success.

Do you have any idea of what could be the cause of this SIGSEV. I could reproduce it in two different machines. As additional info, this seg fault does not arise at all when the dimension of the matrix task Id 16000 has to factorize is smaller.

Thanks in advance.

Best regards,

Alberto.

Zhang_Z_Intel · ‎03-06-2013

It is difficult to guess the rootcause by just looking at your description. It seems to be a sophisticated setup for your MPI execution. I'd suggest you try to narrow down the scope as much as you can, and then provide a minimal test case that can reproduce the problem.

Which version of MKL are you using? How do you link with MKL and MPI libraries? Are you using Intel MPI or some other MPI implementation? What's you OS?

Alberto_F__M_ · ‎03-06-2013

These MPI codes are part of a large software package. They involve a lot of code modules and subroutines, so that it is quite difficult to narrow down the scope. Besides, the segmentation fault is produced for a large-scale test-case that involves 1001 compute nodes of a cluster. As additional info, I know that the symbolic factorization and reordering of the sparse matrix is completed succesfully but the SIGSEV is produced right before the numerical factorization starts. (i can provide the output of PARDISO in verbose mode if required).

As I mentioned in my original post, I could extract the matrix that is being factorized when the seg fault is produced to a matrix market file, and I could factorize it succesfully with a multi-threaded driver executed with 16 threads. I could provide the matrix if required, but i do not know whether this would help.

Are there available compiled INTEL MKL libraries in debug mode? I guess that maybe a parallel debugging session may help to find the rootcause.

The two platforms where I found this SIGSEV are:

**********
PLATFORM 1
**********

==
SO
==

Linux version 2.6.32-220.23.1.bl6.Bull.28.8.x86_64
(efix@atlas.frec.bull.fr)
(gcc version 4.4.5 20110214 (Bull 4.4.5-6) (GCC) )
#1 SMP Thu Jul 5 16:46:35 CEST 2012

===
MPI
===

bullxmpi/1.1.16.5

========
Compiler
========

ifort (IFORT) 12.1.6 20120928

===
MKL
===

/opt/intel/composer_xe_2011_sp1.13.367/mkl

=========
Link line
=========

mpif90 -o INTEL/par_test_dd_methods_3D_structured_ela_overlap.O INTEL/Objects_O/par_test_dd_methods_3D_structured_ela_overlap.o
INTEL/Objects_O/parprob.o INTEL/Objects_O/femprob.o -LINTEL -L/csc/workdir3/sbadia/metis -L/csc/workdir3/sbadia/gidpost/unix
-L/opt/intel/composer_xe_2011_sp1.13.367/mkl/lib/intel64 -L/csc/workdir3/sbadia/Trilinos-Interfaces/Sources/CXX_TrilinosFlatInterfaces
-L/csc/workdir3/sbadia/Trilinos-Interfaces/Sources/For_TrilinosShadowInterfaces/Executables/INTEL -lprob_O -lpar_O -lfem_O -lmetis
-lgidpost_O -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -ltrilinosshadowinterfaces_O

**********
PLATFORM 2
**********

==
SO
==

Linux version 2.6.32-220.23.1.bl6.Bull.28.8.x86_64
(efix@atlas.frec.bull.fr)
(gcc version 4.4.5 20110214 (Bull 4.4.5-6) (GCC) )
#1 SMP Thu Jul 5 16:46:35 CEST 2012

===
MPI
===

bullxmpi/1.1.16.5

========
Compiler
========

ifort (IFORT) 12.1.0 20111011

===
MKL
===

/usr/local/Intel_compilers/c/composer_xe_2011_sp1.7.256/mkl

=========
Link line
=========

mpif90 -o INTEL/par_test_dd_methods_3D_structured_ela_overlap.O INTEL/Objects_O/par_test_dd_methods_3D_structured_ela_overlap.o
INTEL/Objects_O/parprob.o INTEL/Objects_O/femprob.o -LINTEL -L/ccc/cont005/home/pa0909/martinha/metis
-L/ccc/cont005/home/pa0909/martinha/gidpost/unix -L/usr/local/Intel_compilers/c/composer_xe_2011_sp1.7.256/mkl/lib/intel64
-L/ccc/cont005/home/pa0909/martinha/Trilinos-Interfaces/Sources/CXX_TrilinosFlatInterfaces
-L/ccc/cont005/home/pa0909/martinha/Trilinos-Interfaces/Sources/For_TrilinosShadowInterfaces/Executables/INTEL
-lprob_O -lpar_O -lfem_O -lmetis -lgidpost_O -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -ltrilinosshadowinterfaces_O

Alberto_F__M_ · ‎03-11-2013

Hi,

I could now narrow down the problem significantly.

I have a small serial program that reads a sparse matrix from a file and factorizes it using the sparse direct solvers included in PARDISO MKL. When I execute the program with the matrix of interest, PARDISO factorizes it succesfully. No SIGSEV is produced. However, if I transform this small sequential program to a message-passing program (i.e., wrapping its codes around MPI_INIT/MPI_FINALIZE and executing it using mpirun) then a SIGSEV appears. Therefore, the problem seem to be related to the combination of PARDISO MKL and the message-passing environment (OpenMPI).

I could provide the sparse matrix in matrix market format if required.

Best regards,

Alberto.g

Zhang_Z_Intel · ‎03-11-2013

Alberto, this is great. I really appreciate you took effort to narrow down the scope of the problem. Yes, please do send the test matrix, as well as code snippets showing how you call PARDISO and MPI functions.

Alberto F. M. wrote:

Hi,

I could now narrow down the problem significantly.

I have a small serial program that reads a sparse matrix from a file and factorizes it using the sparse direct solvers included in PARDISO MKL. When I execute the program with the matrix of interest, PARDISO factorizes it succesfully. No SIGSEV is produced. However, if I transform this small sequential program to a message-passing program (i.e., wrapping its codes around MPI_INIT/MPI_FINALIZE and executing it using mpirun) then a SIGSEV appears. Therefore, the problem seem to be related to the combination of PARDISO MKL and the message-passing environment (OpenMPI).

I could provide the sparse matrix in matrix market format if required.

Best regards,

Alberto.g

Alberto_F__M_ · ‎03-12-2013

Dear Zhang,

find attached reproduce_sigsev_pardiso_mkl.tgz. The tarball contains a minimal subset of f90
codes that we extracted from our software package, together with a
driver program (test.f90). The package also contains a GNU make
makefile to drive the compilation process under Linux. To compile the codes
in debug mode type the following in the Linux shell:

make debug,

or

make release

to compile the codes in release mode (i.e., with optimization flags enabled).
Before the actual compilation takes place, you have to set several environment
variables on the header of the makefile. In particular, you have to provide the
mkl root directory (MKL_LIB_DIR), to select whether you want to use serial (TYPE=SERIAL) or
message-passing wrappers (TYPE=PARALLEL), or to link against the serial (THREADED=NO) or
multi-threaded (THREADED=YES) MKL libraries. Besides, test.f90 can be "converted" to a
message-passing driver program uncommenting the definition of the MPI cpp macro at the
header of test.f90.

I have uploaded the matrix to be read by the program on the following link:

http://dl.dropbox.com/u/24745418/A_bddc_c.mtx.tgz

Once you have downloaded this file, you have to untar+uncompress it by:

tar xvzf A_bddc_c.mtx.tgz

Finally, the serial program is executed as:

test.O PATH_TO_FILE A_bddc_c
test.g

and the message-passing one as:

mpirun -np 1 test.O PATH_TO_FILE A_bddc_c
test.g

The SIGSEV is produced with:

-test.O or test.g
-MPI cpp macro defined in test.f90
-TYPE=PARALLEL
-THREADED=NO or THREADED=YES

The SIGSEV is NOT produced with:

-test.O or test.g
-MPI cpp macro undefined in test.f90
-TYPE=PARALLEL or TYPE=SERIAL
-THREADED=NO or THREADED=YES

I always used OpenMPI or some sort of modification (i.e., BullxMPI).

As additional info, I was able to reproduce the problem
on the following three machines (all SandyBridge-based):

Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz (HELIOS)
Xeon E5-2680 8C 2.700GHz (CURIE thin nodes)
2x E5-2670 SandyBridge-EP 2.6GHz cache 20MB 8-core (MN3)

However, I could not reproduce it on (no SandyBridge):

Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (my Desktop)
Intel(R) Xeon(R) CPU X7560 @ 2.27GHz (login node of CURIE)

I do not know whether there exists cause-effect here.

Best regards,
Alberto.

TimP · ‎03-13-2013

If you are running under OpenMPI, do you need to link specifically against the openmpi shared objects? I don't know which releases of OpenMPI are tested for each release of MKL (probably not any Bull version of OpenMPI?), nor did I see easily which version you are using. Could the OpenMPI on your successful machines be closer to the one under which MKL was tested? To an American english reader, your comment could be taken to imply you have various OpenMPI versions. I don't even know that a Bull MPI would be available to test on a non-Bull platform.

Alberto_F__M_ · ‎03-13-2013

Dear TimP,

OpenMPI provides a compiler wrapper, mpif90, that automatically selects proper shared objects during linking. Therefore, I do not need to specifically link againts the openmpi's shared objects. I could also reproduce the problem on a machine with a "standard" OpenMPI (i.e., no Bull version), version 1.5.4. This third platform has MKL/13.0.1.

Best regads, Alberto.

TimP · ‎03-13-2013

MKL version numbers are confusing, made worse by the partial marketing over-ride of the compiler version numbering scheme.

The current MKL is numbered 11.0.2 and was bundled with the compilers variously identified as XE2013 Update 2.146, 13.0.2, or 13.1.0.146.

Are you saying that your problem may be connected with MKL version?

Alberto_F__M_ · ‎03-13-2013

Dear TimP,

in order to clarify, I would like to remark that I was able to reproduce the problem on the compute nodes of three different clusters.I am not the administrator of these clusters, only a user. Two of these clusters were installed by Bull (so that they have a Bull-based OpenMPI), and a set of environment specifications are provided above (is the information provided above sufficient to know which is the MKL version installed there?). The other cluster was not installed by Bull, and it has OpenMPI version 1.5.4. In this cluster, I do not know which is the MKL version installed. The only info that I could retrieve comes from the module system of the cluster (see below). Besides, MKL and the Intel compilers are installed under composer_xe_2013.1.117 folder.

Are there any means to know which is the version of MKL installed here ?

I do not know whether my problem is connected with MKL version or not.

upc26229@login1:~> module show MKL
-------------------------------------------------------------------
/apps/modules/modulefiles/libraries/MKL/13.0.1:

module-whatis   loads the MKL 13.0.1
module-verbosity on
conflict   mkl
prereq   intel
display MKL/13.0.1 (LD_LIBRARY_PATH)
prepend-path   LD_LIBRARY_PATH /apps/INTEL/composerxe//mkl/lib/intel64/
-------------------------------------------------------------------
upc26229@login1:~> module show intel
-------------------------------------------------------------------
/apps/modules/modulefiles/compilers/intel/13.0.1:

module-whatis   loads the INTEL 13.0.1 compilers
module-verbosity on
prepend-path   PATH /gpfs/apps/MN3/INTEL//bin
prepend-path   MANPATH /gpfs/apps/MN3/INTEL//man/en_US
prepend-path   LD_LIBRARY_PATH /gpfs/apps/MN3/INTEL//lib/intel64
setenv       INTEL_HOME /gpfs/apps/MN3/INTEL/
setenv       INTEL_VERSION 13.0.1
setenv       INTEL_INC /gpfs/apps/MN3/INTEL//include
setenv       I_MPI_CXX icpc
setenv       I_MPI_CC icc
setenv       I_MPI_F77 ifort
setenv       I_MPI_F90 ifort
setenv       OMPI_CC icc
setenv       OMPI_FC ifort
setenv       OMPI_F90 ifort
setenv       OMPI_F77 ifort
setenv       OMPI_CXX icpc
setenv       MPICH_CC icc
setenv       MPICH_FC ifort
setenv       MPICH_CXX icpc
setenv       MPICH_F77 ifort
setenv       INTEL_LICENSE_FILE 28518@10.2.254.100
-------------------------------------------------------------------

TimP · ‎03-13-2013

I should know the specific inquiry for MKL version, but MKL 11.0.1 was distributed with .117 compilers, so there has been just one subsequent update.

OpenMPI 1.5.4 certainly is a widely used version, but I don't know which OpenMPI was used for MKL qualification testing.

Alberto_F__M_ · ‎03-13-2013

You are right. I found and compiled the versionqueryc example driver. Here it is the result:

upc26229@login1:~/versionqueryc/_results/intel_lp64_parallel_intel64_lib> ./getversion.out

Intel(R) Math Kernel Library Version 11.0.1 Product Build 20121009 for Intel(R) 64 architecture applications

Major version:          11
Minor version:          0
Update version:         1
Product status:         Product
Build:                  20121009
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor
================================================================
Let me stress that the sparse matrix that I provide comes from our own implementation a two-level domain decomposition method, in particular it is related with the so-called coarse-grid correction. The larger the number the subdomains the larger the coarse-grid matrix. The matrix that I provide corresponds to the case of 16000 subdomains. For smaller number of subdomains, the issue did not appear. For larger number of subdomains, the issue appeared. So it seems that some sort of memory threshold is being achieved. I discarded a 32-bit integer issue as the sequential program (without MPI_INIT/MPI_FINALIZE) can factorize the matrix succesfully.

Zhang_Z_Intel · ‎03-14-2013

Alberto,

Thank you very much for the test case. I'll try to reproduce it and let you know soon.

Alberto_F__M_ · ‎03-15-2013

Dear Zhang Z,

in collaboration with CURIE's support staff, we have been able to isolate the issue. I have a shell script that loads some environment variables before actually executing the program (e.g., properly sets LD_LIBRARY_PATH). This script also defines a pair of variables related to malloc behaviour and performance as follows:

export MALLOC_TRIM_THRESHOLD_=-1
export MALLOC_MMAP_MAX_=0

If I do NOT define these variables, the SIGSEV disappears.

VERY IMPORTANT: I realized that the tests with non-Sandy Bridge platforms were performed without the definition of these two env. variables. If I define the variables on a non-Sandy Bridge platform, then the SIGSEV is triggered. Then, it seems to be a CPU independent issue.

Zhang_Z_Intel · ‎03-15-2013

I feel relieved hearing this news, because I was still unable to reproduce the problem. So this is all because of the two wrongly placed env-variables? Can we close this issue now?

Thanks!

Alberto F. M. wrote:

Dear Zhang Z,

in collaboration with CURIE's support staff, we have been able to isolate the issue. I have a shell script that loads some environment variables before actually executing the program (e.g., properly sets LD_LIBRARY_PATH). This script also defines a pair of variables related to malloc behaviour and performance as follows:

export MALLOC_TRIM_THRESHOLD_=-1
export MALLOC_MMAP_MAX_=0

If I do NOT define these variables, the SIGSEV disappears.

VERY IMPORTANT: I realized that the tests with non-Sandy Bridge platforms were performed without the definition of these two env. variables. If I define the variables on a non-Sandy Bridge platform, then the SIGSEV is triggered. Then, it seems to be a CPU independent issue.

Alberto_F__M_ · ‎03-15-2013

This is all, it does not seem to be actually a bug in MKL and/or OpenMPI. It is still unclear why the software does not catch the exception and provides some kind of error report. Is there any recommended set-up for these two env. variables when using MKL? I think that the issue can be closed (at least I can continue with my regular work). Thanks!!!

Intel MKL PARDISO seg. faults within MPI codes