the fix of the issue

asd__asdqwe · ‎11-17-2019

Hello,

CPARDISO is segfaulting on my application when I use version 19.0.5.075, but everything runs fine with version 18.0.3.222. Is this a known regression that is being fixed right now?

The issue comes from:

#11 0x110e0822b in MPI_Bcast (libmpi.12.dylib:x86_64+0xd22b)
#12 0x10ef1529e in MKLMPI_Bcast (libmkl_blacs_mpich_lp64.dylib:x86_64+0x2029e)

I can reproduce this behavior on both macOS and Linux. Thanks for your help.

Gennady_F_Intel · ‎11-17-2019

Hello! This is an unknown issue. Could you give us the reproducer and show us how do you run the code?

asd__asdqwe · ‎11-18-2019

Sadly, I don't have the time to create a self-contained reproducer. Could you please try the following and let me know if this is OK for you to work on the issue?

$ git clone https://github.com/hpddm/hpddm.git && cd hpddm

Copy the Makefile.inc_.txt attached into the newly created folder (rename to Makefile.inc), don't forget to adjust the variables $MKL_INCS and $MKL_LIBS in the file.

$ make cpp
$ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 1 -hpddm_level_2_verbosity 4
$ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 2 -hpddm_level_2_verbosity 4

First example will run, second will not (unless using version 18.0.3.222).

Gennady_F_Intel · ‎11-18-2019

it will not help much unfortunately.

Gennady_F_Intel · ‎11-18-2019

in the case, if you have the active Priority Support, then you may submit confidential support request via the Online Service Center.

asd__asdqwe · ‎11-18-2019

Why won't it help? The problem comes from PARDISO, you can also see in the case with 1 process that there is some unwanted output:

Memory allocated on phase 11    0.0013 MB
Number of non-zeros in L    16
Number of non-zeros in U    1
Memory allocated on phase 22    0.0019 MB

Percentage of computed non-zeros for LL^T factorization
100 %

The PARDISO calling sequence is here https://github.com/hpddm/hpddm/blob/master/include/HPDDM_MKL_PARDISO.hpp#L136-L157.

Gennady_F_Intel · ‎11-18-2019

but I don't see there the list of iparm, types of matrixes you are executing.... that's the reason that having the reproducer will significantly reduce the number of unnecessary questions.

asd__asdqwe · ‎11-18-2019

Here is a first .cpp that you can execute on 4 process. You'll see some unformatted output:

Memory allocated on phase  11 on Rank # 0	-0.0021 MB
Memory allocated on phase  11 on Rank # 1	-0.0020 MB
Number of non-zeros in L on Rank # 0	4
Number of non-zeros in U on Rank # 0	1
Number of non-zeros in L on Rank # 1	8
Number of non-zeros in U on Rank # 1	1
Memory allocated on phase  22 on Rank # 0	-0.0026 MB
Memory allocated on phase  22 on Rank # 1	-0.0025 MB

Percentage of computed non-zeros for LL^T factorization
 100 %

Instead of something like:

Percentage of computed non-zeros for LL^T factorization
 47 %  100 %

=== CPARDISO: solving a symmetric indefinite system ===
Distributed Matrix Input Format is used for CPARDISO (iparm(40) = 2)
1-based array indexing is turned ON
CPARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON

I'm trying to reproduce the segmentation fault as well.

Gennady_F_Intel · ‎11-18-2019

thanks, we will check and will update this thread

asd__asdqwe · ‎11-18-2019

Here is the segfaulting .cpp. You can again run this on 4 processes. It will run fine with MKL 2018, but not with 2019.

Gennady_F_Intel · ‎11-18-2019

yes, we managed to reproduce the issue with version 2019 u5 and will investigate the problem. thanks for report.

Gennady_F_Intel · ‎04-02-2020

the fix of the issue available in MKL 2020 update 1

Gennady_F_Intel · ‎04-02-2020

mkl 2020 u1 - output

$ make run
mpirun -n 4 ./a.out
Major version: 2020
Minor version: 0
Update version: 1
Product status: Product
Build: 20200208
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Memory allocated on phase 11 on Rank # 0 0.0014 MB
Memory allocated on phase 11 on Rank # 1 0.0014 MB
Number of non-zeros in L on Rank # 0 4
Number of non-zeros in U on Rank # 0 1
Number of non-zeros in L on Rank # 1 8
Number of non-zeros in U on Rank # 1 1
Memory allocated on phase 22 on Rank # 0 0.0019 MB
Memory allocated on phase 22 on Rank # 1 0.0021 MB

Percentage of computed non-zeros for LL^T factorization
100 %
... rank ==0, Passed...
... rank == 1, Passed...

asd__asdqwe · ‎04-04-2020

I confirm this fixes the segmentation fault. The output when the message level is greater than 0 is clearly not the standard one, though.

Gennady_F_Intel · ‎04-04-2020

ok, thanks for keeping us the results.

Segmentation fault CPARDISO