- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
CPARDISO is segfaulting on my application when I use version 19.0.5.075, but everything runs fine with version 18.0.3.222. Is this a known regression that is being fixed right now?
The issue comes from:
#11 0x110e0822b in MPI_Bcast (libmpi.12.dylib:x86_64+0xd22b)
#12 0x10ef1529e in MKLMPI_Bcast (libmkl_blacs_mpich_lp64.dylib:x86_64+0x2029e)
I can reproduce this behavior on both macOS and Linux. Thanks for your help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello! This is an unknown issue. Could you give us the reproducer and show us how do you run the code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sadly, I don't have the time to create a self-contained reproducer. Could you please try the following and let me know if this is OK for you to work on the issue?
$ git clone https://github.com/hpddm/hpddm.git && cd hpddm
Copy the Makefile.inc_.txt attached into the newly created folder (rename to Makefile.inc), don't forget to adjust the variables $MKL_INCS and $MKL_LIBS in the file.
$ make cpp $ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 1 -hpddm_level_2_verbosity 4 $ mpirun -n 4 ./bin/schwarz_cpp -hpddm_verbosity -hpddm_schwarz_coarse_correction deflated -hpddm_level_2_p 2 -hpddm_level_2_verbosity 4
First example will run, second will not (unless using version 18.0.3.222).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it will not help much unfortunately.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in the case, if you have the active Priority Support, then you may submit confidential support request via the Online Service Center.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why won't it help? The problem comes from PARDISO, you can also see in the case with 1 process that there is some unwanted output:
Memory allocated on phase 11 0.0013 MB
Number of non-zeros in L 16
Number of non-zeros in U 1
Memory allocated on phase 22 0.0019 MB
Percentage of computed non-zeros for LL^T factorization
100 %
The PARDISO calling sequence is here https://github.com/hpddm/hpddm/blob/master/include/HPDDM_MKL_PARDISO.hpp#L136-L157.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
but I don't see there the list of iparm, types of matrixes you are executing.... that's the reason that having the reproducer will significantly reduce the number of unnecessary questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a first .cpp that you can execute on 4 process. You'll see some unformatted output:
Memory allocated on phase 11 on Rank # 0 -0.0021 MB Memory allocated on phase 11 on Rank # 1 -0.0020 MB Number of non-zeros in L on Rank # 0 4 Number of non-zeros in U on Rank # 0 1 Number of non-zeros in L on Rank # 1 8 Number of non-zeros in U on Rank # 1 1 Memory allocated on phase 22 on Rank # 0 -0.0026 MB Memory allocated on phase 22 on Rank # 1 -0.0025 MB Percentage of computed non-zeros for LL^T factorization 100 %
Instead of something like:
Percentage of computed non-zeros for LL^T factorization 47 % 100 % === CPARDISO: solving a symmetric indefinite system === Distributed Matrix Input Format is used for CPARDISO (iparm(40) = 2) 1-based array indexing is turned ON CPARDISO double precision computation is turned ON METIS algorithm at reorder step is turned ON Single-level factorization algorithm is turned ON
I'm trying to reproduce the segmentation fault as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks, we will check and will update this thread
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, we managed to reproduce the issue with version 2019 u5 and will investigate the problem. thanks for report.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the fix of the issue available in MKL 2020 update 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mkl 2020 u1 - output
$ make run
mpirun -n 4 ./a.out
Major version: 2020
Minor version: 0
Update version: 1
Product status: Product
Build: 20200208
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================
Memory allocated on phase 11 on Rank # 0 0.0014 MB
Memory allocated on phase 11 on Rank # 1 0.0014 MB
Number of non-zeros in L on Rank # 0 4
Number of non-zeros in U on Rank # 0 1
Number of non-zeros in L on Rank # 1 8
Number of non-zeros in U on Rank # 1 1
Memory allocated on phase 22 on Rank # 0 0.0019 MB
Memory allocated on phase 22 on Rank # 1 0.0021 MB
Percentage of computed non-zeros for LL^T factorization
100 %
... rank ==0, Passed...
... rank == 1, Passed...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I confirm this fixes the segmentation fault. The output when the message level is greater than 0 is clearly not the standard one, though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, thanks for keeping us the results.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page