Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Segmentation fault CPARDISO

asd__asdqwe
Beginner
371 Views

Hello,

CPARDISO is segfaulting on my application when I use version 19.0.5.075, but everything runs fine with version 18.0.3.222. Is this a known regression that is being fixed right now?

The issue comes from:

    #11 0x110e0822b in MPI_Bcast (libmpi.12.dylib:x86_64+0xd22b)
    #12 0x10ef1529e in MKLMPI_Bcast (libmkl_blacs_mpich_lp64.dylib:x86_64+0x2029e)

I can reproduce this behavior on both macOS and Linux. Thanks for your help.

0 Kudos
14 Replies
Gennady_F_Intel
Moderator
371 Views

Hello! This is an unknown issue. Could you give us the reproducer and show us how do you run the code? 

asd__asdqwe
Beginner
371 Views

Sadly, I don't have the time to create a self-contained reproducer. Could you please try the following and let me know if this is OK for you to work on the issue?

$ git clone https://github.com/hpddm/hpddm.git && cd hpddm

Copy the Makefile.inc_.txt attached into the newly created folder (rename to Makefile.inc), don't forget to adjust the variables $MKL_INCS and $MKL_LIBS in the file.

$ make cpp
$ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 1 -hpddm_level_2_verbosity 4
$ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 2 -hpddm_level_2_verbosity 4

First example will run, second will not (unless using version 18.0.3.222).

Gennady_F_Intel
Moderator
371 Views

it will not help much unfortunately. 

Gennady_F_Intel
Moderator
371 Views

in the case, if you have the active Priority Support, then you may submit confidential support request  via the Online Service Center.

asd__asdqwe
Beginner
371 Views

Why won't it help? The problem comes from PARDISO, you can also see in the case with 1 process that there is some unwanted output:

Memory allocated on phase  11     0.0013 MB
Number of non-zeros in L     16
Number of non-zeros in U     1
Memory allocated on phase  22     0.0019 MB

Percentage of computed non-zeros for LL^T factorization
 100 %

The PARDISO calling sequence is here https://github.com/hpddm/hpddm/blob/master/include/HPDDM_MKL_PARDISO.hpp#L136-L157.

Gennady_F_Intel
Moderator
371 Views

but I don't see there the list of iparm, types of matrixes you are executing.... that's the reason that having the reproducer will significantly reduce the number of unnecessary questions.

asd__asdqwe
Beginner
371 Views

Here is a first .cpp that you can execute on 4 process. You'll see some unformatted output:

Memory allocated on phase  11 on Rank # 0	-0.0021 MB
Memory allocated on phase  11 on Rank # 1	-0.0020 MB
Number of non-zeros in L on Rank # 0	4
Number of non-zeros in U on Rank # 0	1
Number of non-zeros in L on Rank # 1	8
Number of non-zeros in U on Rank # 1	1
Memory allocated on phase  22 on Rank # 0	-0.0026 MB
Memory allocated on phase  22 on Rank # 1	-0.0025 MB

Percentage of computed non-zeros for LL^T factorization
 100 %

Instead of something like:

Percentage of computed non-zeros for LL^T factorization
 47 %  100 %

=== CPARDISO: solving a symmetric indefinite system ===
Distributed Matrix Input Format is used for CPARDISO (iparm(40) = 2)
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON

I'm trying to reproduce the segmentation fault as well.

Gennady_F_Intel
Moderator
371 Views

thanks, we will check and will update this thread

asd__asdqwe
Beginner
371 Views

Here is the segfaulting .cpp. You can again run this on 4 processes. It will run fine with MKL 2018, but not with 2019.

Gennady_F_Intel
Moderator
371 Views

yes, we managed to reproduce the issue with version 2019 u5 and will investigate the problem. thanks for report.

Gennady_F_Intel
Moderator
371 Views

the fix of the issue available in MKL 2020 update 1

Gennady_F_Intel
Moderator
371 Views

mkl 2020 u1 - output

$ make run
mpirun -n 4 ./a.out
Major version:           2020
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20200208
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Memory allocated on phase  11 on Rank # 0       0.0014 MB
Memory allocated on phase  11 on Rank # 1       0.0014 MB
Number of non-zeros in L on Rank # 0    4
Number of non-zeros in U on Rank # 0    1
Number of non-zeros in L on Rank # 1    8
Number of non-zeros in U on Rank # 1    1
Memory allocated on phase  22 on Rank # 0       0.0019 MB
Memory allocated on phase  22 on Rank # 1       0.0021 MB

Percentage of computed non-zeros for LL^T factorization
 100 %
  ... rank ==0, Passed...
  ... rank == 1, Passed...
 

asd__asdqwe
Beginner
371 Views

I confirm this fixes the segmentation fault. The output when the message level is greater than 0 is clearly not the standard one, though.

Gennady_F_Intel
Moderator
371 Views

ok, thanks for keeping us the results.

Reply