Segmentation fault in mkl_pds_lp64_assemble_csr_full

asd__asdqwe · ‎08-15-2016

Hello,

I'm trying to use the following file with 3 MPI processes but I end up with the following trace:

(gdb) bt
#0  0x00007ffff52ac38f in mkl_pds_lp64_assemble_csr_full () from /opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so
#1  0x00007ffff685bec1 in mkl_pds_lp64_cluster_sparse_solver () from /opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so
#2  0x0000000000401abc in main () at source/cl_solver_unsym_distr_c.c:197

I compile the file using the MKL examples with the following line:

$ make sointel64 mpi=mpich2 compiler=intel
$ mpirun.mpich -np 3 _results/intel_mpich2_lp64_intel64_so/cl_solver_unsym_distr_c.exe.bac

Thank you for looking.

Alexander_K_Intel2 · ‎08-15-2016

Hi,

You are correct, the issue exist for small dense matrices in case of number of processes more than 1. Same situation for BSr format in neighboring forum post. We will provide further details after investigation

Thanks,

Alex

asd__asdqwe · ‎08-16-2016

Hello,

Thank you for looking. Please keep me up to date. Thank you.

asd__asdqwe · ‎09-19-2016

This is still segfaulting with MKL from m_ccompxe_2017.0.036.dmg.

Do you have a fix? Thank you.

Gennady_F_Intel · ‎09-20-2016

the fix is targeted to be released the next ( nearest ) update 1 ( MKL 2017 update 1). We will keep you updated with this topic.

asd__asdqwe · ‎01-07-2017

This is still not fixed, could you give me an update on the situation please?

Gennady_F_Intel · ‎01-08-2017

I don't see problem on my side in that case. Checking with the same example ( only added mkl_get_version() routine to show version of MKL) and with this command line:

$ make sointel64 mpi=mpich2 compiler=intel
$ mpirun -np 3 _results/intel_mpich2_lp64_intel64_so/cl_solver_unsym_distr_c.exe

here is the output: for brevity I skipped all intermediate results.

Major version: 2017
Minor version: 0
Update version: 1
Product status: Product
Build: 20161005
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================

=== CPARDISO: solving a real nonsymmetric system ===
Distributed Matrix Input Format is used for CPARDISO (iparm(40) = 2)
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000007 s
Time spent in reordering of the initial matrix (reorder) : 0.000128 s
Time spent in symbolic factorization (symbfct) : 0.000382 s
Time spent in data preparations for factorization (parlist) : 0.000004 s
Time spent in allocation of internal data structures (malloc) : 0.000734 s
Time spent in additional calculations : 0.000021 s
Total time spent : 0.001276 s

Statistics:
===========
Parallel Direct Factorization is running on 3 MPI and 6 OpenMP per MPI process

< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000

number of right-hand sides: 1

..............

.............

Solving system...

=== CPARDISO: solving a real nonsymmetric system ===

Summary: ( solution phase )
================

Statistics:

===========
Parallel Direct Factorization is running on 3 MPI and 6 OpenMP per MPI process

< Linear system Ax = b >
number of equations: 5
number of non-zeros in A: 13
number of non-zeros in A (%): 52.000000

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 72
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 2
size of largest supernode: 4
number of non-zeros in L: 19
number of non-zeros in U: 2
number of non-zeros in L+U: 21
gflop for the numerical factorization: 0.000000

gflop/s for the numerical factorization: 0.000009

The solution of the system is:
on zero process x [0] = 0.263109
on zero process x [1] = 0.305243
on zero process x [2] = -0.347378

The solution of the system is:
on first process x [0] = -0.347378
on first process x [1] = 0.205993
on first process x [2] = 0.288390

TEST PASSED

asd__asdqwe · ‎01-08-2017

On macOS, here is what I get:

$ mpic++ -cxx=icpc cl_solver_unsym_distr_c.c -I/opt/intel/mkl/include -L/opt/intel/mkl/lib -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_mpich_lp64 -lmkl_rt -L/opt/intel/lib -liomp5 -lmkl_intel_thread
$ mpirun -np 3 ./a.out
Major version: 2017
Minor version: 0
Update version: 1
Product status: Product
Build: 20161005
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================

Major version: 2017
Minor version: 0
Update version: 1

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11905 RUNNING AT XXX.local
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11 (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

asd__asdqwe · ‎01-08-2017

Exact same error on Linux:

$ mpicxx.mpich -cxx=icpc cl_solver_unsym_distr_c.c -I/opt/intel/mkl/include -L/opt/intel/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_rt -L/opt/intel/lib/intel64 -liomp5 -lmkl_intel_thread
$ mpirun.mpich -np 3 ./a.out

Major version: 2017
Minor version: 0
Update version: 1
Product status: Product
Build: 20161005
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11287 RUNNING AT XXX
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Gennady_F_Intel · ‎01-09-2017

I still don't see problems with this case on such CPU (SSE4.2) also. I removed mkl_rt and added -lm -ldd ( see MKL Linker Adviser).

mpiicc my_cl_solver_unsym_distr_c.c -I/opt/intel/compilers_and_libraries_2017/mkl/include \
-L/opt/intel/compilers_and_libraries_2017/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 \
-L/opt/intel/compilers_and_libraries_2017/linux/compiler/lib/intel64 -liomp5 -lmkl_intel_thread -lm -ldl

mpirun -np 3 ./a.out

[gfedorov@iris u675380]$ mpirun -np 3 ./a.out
Major version: 2017
Minor version: 0
Update version: 1
Product status: Product
Build: 20161005
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================
Major version: 2017
Minor version: 0
Update version: 1
Product status: Product
Build: 20161005
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================
Major version: 2017
Minor version: 0
Update version: 1
Product status: Product
Build: 20161005
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================

.........................................

The solution of the system is:
on zero process x [0] = 0.263109
on zero process x [1] = 0.305243
on zero process x [2] = -0.347378

The solution of the system is:
on first process x [0] = -0.347378
on first process x [1] = 0.205993
on first process x [2] = 0.288390

TEST PASSED

asd__asdqwe · ‎01-10-2017

On macOS, this should be

mpicc cl_solver_unsym_distr_c.c -I/opt/intel/compilers_and_libraries_2017/mac/mkl/include \
-L/opt/intel/compilers_and_libraries_2017/mac/mkl/lib/ -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_mpich_lp64 \
-L/opt/intel/compilers_and_libraries_2017/mac/lib/ -liomp5 -lmkl_intel_thread -lm -ldl

And yes, it is still segfault'ing...

asd__asdqwe · ‎01-16-2017

Can you reproduce this error on macOS?

asd__asdqwe · ‎01-23-2017

Anyone?

Alexander_K_Intel2 · ‎01-23-2017

Hi,

give me a couple of days to play with your reproducer - i will back with any news

Thanks,

Alex

Alexander_K_Intel2 · ‎01-24-2017

HI,

On MacOs on my side it passed correctly:

mpicc -cc=icc -Wall -I../../include -c -o _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.o source/cl_solver_unsym_c.c
mpicc -cc=icc _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.o -o _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.exe -L "../../lib" -lmkl_blacs_mpich_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -L "../../../compiler/lib" -liomp5 -lm
mpiexec -n 3 /usr/bin/env DYLD_BIND_AT_LAUNCH=1 DYLD_LIBRARY_PATH="../../lib":"../../../compiler/lib": OMP_NUM_THREADS=2 _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.exe > _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.res

res file attached