Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
34 Views

Segmentation fault in mkl_pds_lp64_assemble_csr_full

Hello,

I'm trying to use the following file with 3 MPI processes but I end up with the following trace:

(gdb) bt
#0  0x00007ffff52ac38f in mkl_pds_lp64_assemble_csr_full () from /opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_thread.so
#1  0x00007ffff685bec1 in mkl_pds_lp64_cluster_sparse_solver () from /opt/intel/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so
#2  0x0000000000401abc in main () at source/cl_solver_unsym_distr_c.c:197

I compile the file using the MKL examples with the following line:

$ make sointel64 mpi=mpich2 compiler=intel
$ mpirun.mpich -np 3 _results/intel_mpich2_lp64_intel64_so/cl_solver_unsym_distr_c.exe.bac

Thank you for looking.

0 Kudos
14 Replies
Highlighted
34 Views

Hi,

You are correct, the issue exist for small dense matrices in case of number of processes more than 1. Same situation for BSr format in neighboring forum post. We will provide further details after investigation

Thanks,

Alex 

0 Kudos
Highlighted
Beginner
34 Views

Hello,

Thank you for looking. Please keep me up to date. Thank you.

0 Kudos
Highlighted
Beginner
34 Views

This is still segfaulting with MKL from m_ccompxe_2017.0.036.dmg.

Do you have a fix? Thank you.

0 Kudos
Highlighted
Moderator
34 Views

the fix is targeted to be released the next ( nearest ) update 1 ( MKL 2017 update 1). We will keep you updated with this topic.

0 Kudos
Highlighted
Beginner
34 Views

This is still not fixed, could you give me an update on the situation please?

0 Kudos
Highlighted
Moderator
34 Views

I don't see problem on my side in  that case. Checking with the same example ( only added mkl_get_version() routine to show version of MKL) and with this command line:

$ make sointel64 mpi=mpich2 compiler=intel
$ mpirun -np 3 _results/intel_mpich2_lp64_intel64_so/cl_solver_unsym_distr_c.exe

here is the output: for brevity I skipped all intermediate results.

Major version:           2017
Minor version:           0
Update version:          1

Product status:          Product
Build:                   20161005
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================

=== CPARDISO: solving a real nonsymmetric system ===
Distributed Matrix Input Format is used for CPARDISO (iparm(40) = 2)
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON

Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000007 s
Time spent in reordering of the initial matrix (reorder)         : 0.000128 s
Time spent in symbolic factorization (symbfct)                   : 0.000382 s
Time spent in data preparations for factorization (parlist)      : 0.000004 s
Time spent in allocation of internal data structures (malloc)    : 0.000734 s
Time spent in additional calculations                            : 0.000021 s
Total time spent                                                 : 0.001276 s

Statistics:
===========
Parallel Direct Factorization is running on 3 MPI and 6 OpenMP per MPI process

< Linear system Ax = b >
             number of equations:           5
             number of non-zeros in A:      13
             number of non-zeros in A (%): 52.000000

             number of right-hand sides:    1

..............

.............

Solving system...

=== CPARDISO: solving a real nonsymmetric system ===

Summary: ( solution phase )
================

Statistics:

===========
Parallel Direct Factorization is running on 3 MPI and 6 OpenMP per MPI process

< Linear system Ax = b >
             number of equations:           5
             number of non-zeros in A:      13
             number of non-zeros in A (%): 52.000000

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    2
             size of largest supernode:               4
             number of non-zeros in L:                19
             number of non-zeros in U:                2
             number of non-zeros in L+U:              21
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000009


The solution of the system is: 
 on zero process x [0] =  0.263109
 on zero process x [1] =  0.305243
 on zero process x [2] = -0.347378

The solution of the system is: 
 on first process x [0] = -0.347378
 on first process x [1] =  0.205993
 on first process x [2] =  0.288390

 TEST PASSED

0 Kudos
Highlighted
Beginner
34 Views

On macOS, here is what I get:

$ mpic++ -cxx=icpc cl_solver_unsym_distr_c.c -I/opt/intel/mkl/include -L/opt/intel/mkl/lib -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_mpich_lp64 -lmkl_rt -L/opt/intel/lib -liomp5 -lmkl_intel_thread
$ mpirun -np 3 ./a.out
Major version:           2017
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20161005
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors
================================================================

Major version:           2017
Minor version:           0
Update version:          1

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 11905 RUNNING AT XXX.local
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11 (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

0 Kudos
Highlighted
Beginner
34 Views

Exact same error on Linux:

$ mpicxx.mpich -cxx=icpc cl_solver_unsym_distr_c.c -I/opt/intel/mkl/include -L/opt/intel/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_rt -L/opt/intel/lib/intel64 -liomp5 -lmkl_intel_thread
$ mpirun.mpich -np 3 ./a.out

Major version:           2017
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20161005
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 11287 RUNNING AT XXX
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

0 Kudos
Highlighted
Moderator
34 Views

I still don't see problems with this case on such CPU  (SSE4.2) also. I removed mkl_rt and added -lm -ldd ( see MKL Linker Adviser).

mpiicc my_cl_solver_unsym_distr_c.c -I/opt/intel/compilers_and_libraries_2017/mkl/include \
 -L/opt/intel/compilers_and_libraries_2017/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 \
 -L/opt/intel/compilers_and_libraries_2017/linux/compiler/lib/intel64 -liomp5 -lmkl_intel_thread -lm -ldl

mpirun -np 3 ./a.out

[gfedorov@iris u675380]$ mpirun -np 3 ./a.out
Major version:           2017
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20161005
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================
Major version:           2017
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20161005
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================
Major version:           2017
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20161005
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors
================================================================

.........................................

.........................................

The solution of the system is:
 on zero process x [0] =  0.263109
 on zero process x [1] =  0.305243
 on zero process x [2] = -0.347378

The solution of the system is:
 on first process x [0] = -0.347378
 on first process x [1] =  0.205993
 on first process x [2] =  0.288390

 TEST PASSED

 

 

 

0 Kudos
Highlighted
Beginner
34 Views

On macOS, this should be

mpicc cl_solver_unsym_distr_c.c -I/opt/intel/compilers_and_libraries_2017/mac/mkl/include \
 -L/opt/intel/compilers_and_libraries_2017/mac/mkl/lib/ -lmkl_intel_lp64 -lmkl_core -lmkl_scalapack_lp64 -lmkl_blacs_mpich_lp64 \
 -L/opt/intel/compilers_and_libraries_2017/mac/lib/ -liomp5 -lmkl_intel_thread -lm -ldl

And yes, it is still segfault'ing...

0 Kudos
Highlighted
Beginner
34 Views

Can you reproduce this error on macOS?

0 Kudos
Highlighted
Beginner
34 Views

Anyone?

0 Kudos
Highlighted
34 Views

Hi,

give me a couple of days to play with your reproducer - i will back with any news

Thanks,

Alex

0 Kudos
Highlighted
34 Views

HI,

On MacOs on my side it passed correctly:

mpicc -cc=icc -Wall  -I../../include -c -o _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.o source/cl_solver_unsym_c.c
mpicc -cc=icc _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.o -o _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.exe   -L "../../lib" -lmkl_blacs_mpich_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -L "../../../compiler/lib" -liomp5  -lm
mpiexec -n 3 /usr/bin/env DYLD_BIND_AT_LAUNCH=1 DYLD_LIBRARY_PATH="../../lib":"../../../compiler/lib": OMP_NUM_THREADS=2 _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.exe > _results/intel_mpich_lp64_intel64_dylib/cl_solver_unsym_c.res

res file attached

 

0 Kudos