Community
cancel
Showing results for 
Search instead for 
Did you mean: 
asd__asdqwe
Beginner
222 Views

Segmentation fault CPARDISO

Hello,

CPARDISO is segfaulting on my application when I use version 19.0.5.075, but everything runs fine with version 18.0.3.222. Is this a known regression that is being fixed right now?

The issue comes from:

    #11 0x110e0822b in MPI_Bcast (libmpi.12.dylib:x86_64+0xd22b)
    #12 0x10ef1529e in MKLMPI_Bcast (libmkl_blacs_mpich_lp64.dylib:x86_64+0x2029e)

I can reproduce this behavior on both macOS and Linux. Thanks for your help.

0 Kudos
14 Replies
Gennady_F_Intel
Moderator
222 Views

Hello! This is an unknown issue. Could you give us the reproducer and show us how do you run the code? 

asd__asdqwe
Beginner
222 Views

Sadly, I don't have the time to create a self-contained reproducer. Could you please try the following and let me know if this is OK for you to work on the issue?

$ git clone https://github.com/hpddm/hpddm.git && cd hpddm

Copy the Makefile.inc_.txt attached into the newly created folder (rename to Makefile.inc), don't forget to adjust the variables $MKL_INCS and $MKL_LIBS in the file.

$ make cpp
$ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 1 -hpddm_level_2_verbosity 4
$ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 2 -hpddm_level_2_verbosity 4

First example will run, second will not (unless using version 18.0.3.222).

Gennady_F_Intel
Moderator
222 Views

it will not help much unfortunately. 

Gennady_F_Intel
Moderator
222 Views

in the case, if you have the active Priority Support, then you may submit confidential support request  via the Online Service Center.

asd__asdqwe
Beginner
222 Views

Why won't it help? The problem comes from PARDISO, you can also see in the case with 1 process that there is some unwanted output:

Memory allocated on phase  11     0.0013 MB
Number of non-zeros in L     16
Number of non-zeros in U     1
Memory allocated on phase  22     0.0019 MB

Percentage of computed non-zeros for LL^T factorization
 100 %

The PARDISO calling sequence is here https://github.com/hpddm/hpddm/blob/master/include/HPDDM_MKL_PARDISO.hpp#L136-L157.

Gennady_F_Intel
Moderator
222 Views

but I don't see there the list of iparm, types of matrixes you are executing.... that's the reason that having the reproducer will significantly reduce the number of unnecessary questions.

asd__asdqwe
Beginner
222 Views

Here is a first .cpp that you can execute on 4 process. You'll see some unformatted output:

Memory allocated on phase  11 on Rank # 0	-0.0021 MB
Memory allocated on phase  11 on Rank # 1	-0.0020 MB
Number of non-zeros in L on Rank # 0	4
Number of non-zeros in U on Rank # 0	1
Number of non-zeros in L on Rank # 1	8
Number of non-zeros in U on Rank # 1	1
Memory allocated on phase  22 on Rank # 0	-0.0026 MB
Memory allocated on phase  22 on Rank # 1	-0.0025 MB

Percentage of computed non-zeros for LL^T factorization
 100 %

Instead of something like:

Percentage of computed non-zeros for LL^T factorization
 47 %  100 %

=== CPARDISO: solving a symmetric indefinite system ===
Distributed Matrix Input Format is used for CPARDISO (iparm(40) = 2)
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON

I'm trying to reproduce the segmentation fault as well.

Gennady_F_Intel
Moderator
222 Views

thanks, we will check and will update this thread

asd__asdqwe
Beginner
222 Views

Here is the segfaulting .cpp. You can again run this on 4 processes. It will run fine with MKL 2018, but not with 2019.

Gennady_F_Intel
Moderator
222 Views

yes, we managed to reproduce the issue with version 2019 u5 and will investigate the problem. thanks for report.

Gennady_F_Intel
Moderator
222 Views

the fix of the issue available in MKL 2020 update 1

Gennady_F_Intel
Moderator
222 Views

mkl 2020 u1 - output

$ make run
mpirun -n 4 ./a.out
Major version:           2020
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20200208
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Memory allocated on phase  11 on Rank # 0       0.0014 MB
Memory allocated on phase  11 on Rank # 1       0.0014 MB
Number of non-zeros in L on Rank # 0    4
Number of non-zeros in U on Rank # 0    1
Number of non-zeros in L on Rank # 1    8
Number of non-zeros in U on Rank # 1    1
Memory allocated on phase  22 on Rank # 0       0.0019 MB
Memory allocated on phase  22 on Rank # 1       0.0021 MB

Percentage of computed non-zeros for LL^T factorization
 100 %
  ... rank ==0, Passed...
  ... rank == 1, Passed...
 

asd__asdqwe
Beginner
222 Views

I confirm this fixes the segmentation fault. The output when the message level is greater than 0 is clearly not the standard one, though.

Gennady_F_Intel
Moderator
222 Views

ok, thanks for keeping us the results.

Reply