- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello to all, I'm new to this forum and I got a question on the MKL dss.
I'm currently working on a finite element program I wrote by myself. Every time the program is called, there are two systems of equations to be solved. The system solved at first ist the same system solved in the last iteration increment at the end. So the idea is to store the factorized matrix and not delete it at the
end of the program. This needs a lot of memory, but is way faster. My program is already working with one thread, but I like to performe multithread computations. For solvinig I use the Intel MKL DSS solver. My ifort Version is 11.0 (which is very old I know).
Now the Problem is, when the Program runs with multiple threads, it produces an illegal memory reference type error the first time the not deleted factorized matrix is accessed again. Do you have an idea where this could come frome? I thought Intel MKL would be thread safe? It should be said, that it is possible that one thread factorizes the matrix and then another thread uses this stored matrix (in theory, but not working this way). Do you have an idea?
best regards
Nils Lange
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Nils,
You've described a complicated workflow. Ideally, for fast investigation, we'd like to get a reproducer which has a similar workflow and (not necesssarily but) maybe a toy matrix (say, from one of our example) for simplicity.
General comments:
1) I am not sure I got your threading model from the description. Again, an example or at least a pseudocode would help.
2) I do not recommend using DSS interface, I always suggest using PARDISO. DSS interface has limited scope which sometimes may lead to confusion and is not any simpler in my opinion than regular PARDISO. Avoid using it as much as you can.
3) I would like to assure you that MKL is thread safe in your case but since you observe an issue, I'd better double check.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kirill,
thanks for the fast reply. The problem is, my program has already around 2000 lines of code, then maybe more 1500 lines of code I only took from my institute, maybe 1500 lines of code in Python Preprocessing and then my program is called from abaqus, a commercial FE Program. I think its not possible to bring it to a small pseudocode, I'm sorry.
But here whats working: I can run Abaqus in multithread mode and then in each thread (each thread computation is compleatly separated) I go through the hole process (dss_create...dss_factor_real...DSS_DELETE) and its working. Whats also working is that in a single thread computation I can save the factorized matrix by NOT calling DSS_DELETE at the end, and when my programm is called again from abaqus I do handle%dummy=pointer (where pointer is the Integer Number I saved) and it's working (lets say I have 1000 not deleted matrixes and I save all the pointers).
But this method is not working in a multithread computation. It can be assured from my side, that the saved and not deleted factorized matrix is only accessed from one thread at a time. Still I get a memory reference error.
Best regards
Nils Lange
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Nils,
I understand that. Based on your description, I imagine the following pseudocode:
#pragma omp parallel num_threads(2)
{
myid = thread_index // 0 or 1
// setting up parameters for PARDISO (or DSS)
if (myid == 0)
pardiso(phase=12); // reordering and factorization
omp_barrier
if (myid == 1)
pardiso(phase=33); // solve, causes seg fault?
}
Is this correct?
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kirill,
I'm working in Fortran and it looks like C++ but still I think thats not exactly what I'm doing. My programm is executed in multiple threads completely seperated from each other. A Pseudocode would maybe look like this:
(executed in multiple threads but with DIFFERENT ID's)
program pseudocode
....
if (beginning of analysis) then
error=dss_create(handle,...)
pointer_array(ID)=handle%dummy
end if
if (middle of analysis) then
handle%dummy=pointer_array(ID)
end if
error=dss_factor_real(handle,...)
if (end of program) then
error=DSS_DELETE(handle)
end if
......
end program pseudocode
I hope you can see my point, it's really a bit complex and not that easy to explain.
Best regards
Nils Lange
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I created a working code with a minimalistic example, that simulates my problem. I'm solving the standard DSS example problem in one thread by factorizing the matrix and all that stuff, then in another thread use that factorized matrix to only solve the problem. Then the program crashes. I'm using openMP. Without using openMP and only one thread the Code works and produces no error. Maybe you can help me now better.
INCLUDE 'mkl_dss.f90' ! Include the standard DSS "header file." program bsp use omp_lib use mkl_dss implicit none integer :: tnr INTEGER :: i INTEGER:: handle_number LOGICAL::solve call omp_set_num_threads( 2 ) solve=.TRUE. !$omp parallel private( i ) !$omp do do i = 1, 4 tnr = omp_get_thread_num() ! Threadnumber if (tnr==0 .AND. solve) then WRITE(*,*) 'Thread:',omp_get_thread_num() WRITE(*,*) 'create problem, factor and solve' CALL factor_and_solve(handle_number) solve=.FALSE. end if end do !$omp end do !$omp end parallel call omp_set_num_threads( 2 ) solve=.TRUE. !$omp parallel private( i ) !$omp do do i = 1, 4 tnr = omp_get_thread_num() ! Threadnumber if (tnr==1 .AND. solve) then solve=.FALSE. WRITE(*,*)'Thread:', omp_get_thread_num() WRITE(*,*)'only solve, do not factor matrix' CALL only_solve(handle_number) end if end do !$omp end do !$omp end parallel end program bsp SUBROUTINE factor_and_solve(handle_number) use mkl_dss IMPLICIT NONE INTEGER, PARAMETER :: dp = KIND(1.0D0) INTEGER :: error INTEGER :: j INTEGER :: columns(9) INTEGER :: nCols INTEGER :: nNonZeros INTEGER :: nRhs INTEGER :: nRows REAL(KIND=DP) :: rhs(5) INTEGER :: rowIndex(6) REAL(KIND=DP) :: solution(5) REAL(KIND=DP) :: values(9) TYPE(MKL_DSS_HANDLE) :: handle ! Allocate storage for the solver handle. INTEGER::handle_number !stores handle number ! Set the problem to be solved. nRows = 5 nCols = 5 nNonZeros = 9 nRhs = 1 rowIndex = (/ 1, 6, 7, 8, 9, 10 /) columns = (/ 1, 2, 3, 4, 5, 2, 3, 4, 5 /) values = (/ 9.0_DP, 1.5_DP, 6.0_DP, 0.75_DP, 3.0_DP, 0.5_DP, 12.0_DP, & & 0.625_DP, 16.0_DP /) rhs = (/ 1.0_DP, 2.0_DP, 3.0_DP, 4.0_DP, 5.0_DP /) ! Initialize the solver. error = dss_create( handle, MKL_DSS_MSG_LVL_WARNING + MKL_DSS_TERM_LVL_ERROR + MKL_DSS_OOC_STRONG ) ! Define the non-zero structure of the matrix. error = dss_define_structure( handle, MKL_DSS_SYMMETRIC, rowIndex, nRows, & & nCols, columns, nNonZeros ) ! Reorder the matrix. error = dss_reorder( handle, MKL_DSS_DEFAULTS, [0] ) ! Factor the matrix. error = dss_factor_real( handle, MKL_DSS_DEFAULTS, values ) ! Allocate the solution vector and solve the problem. error = dss_solve_real(handle, MKL_DSS_DEFAULTS, rhs, nRhs, solution ) ! Print the solution vector WRITE(*,"('Solution Array: '(5F10.3))") ( solution(j), j = 1, nCols ) handle_number=handle%dummy END SUBROUTINE factor_and_solve SUBROUTINE only_solve(handle_number) use mkl_dss IMPLICIT NONE TYPE(MKL_DSS_HANDLE) :: handle ! Allocate storage for the solver handle. INTEGER, PARAMETER :: dp = KIND(1.0D0) INTEGER :: error INTEGER :: j REAL(KIND=DP) :: rhs(5) REAL(KIND=DP) :: solution(5) INTEGER :: nRhs INTEGER :: nCols INTEGER:: handle_number !stores handle pointer handle%dummy=handle_number nCols = 5 nRhs = 1 rhs = (/ 1.0_DP, 2.0_DP, 3.0_DP, 4.0_DP, 5.0_DP /) error = dss_solve_real(handle, MKL_DSS_DEFAULTS, rhs, nRhs, solution ) ! Print the solution vector WRITE(*,"('Solution Array: '(5F10.3))") ( solution(j), j = 1, nCols ) END SUBROUTINE only_solve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
Thanks for creating a reproducer! Unfortunately, I cannot reproduce the failure, it all work just fine with the versions of MKL I tried. So I suggest trying a newer MKL version (maybe we had a problem long ago but fixed it) if possible. Also, could you tell me what is the exact version of MKL you're using and how you link your application with MKL?
Also, you're using the OOC (Out-of-Core) mode of PARDISO. Is this intended? If not, re-consider the idea of using DSS API. You enabled OOC by having the flag MKL_DSS_OOC_STRONG. But this feature only makes sense in a quite rare case when the factors cannot fit into the RAM available. And the price for overcoming this limitation with OOC is performance. See more details here: https://software.intel.com/content/www/us/en/develop/articles/how-to-use-ooc-pardiso.html
So, in a "normal" situation, you don't need OOC. Also, with PARDISO (main API) you can use more advanced features that were not ported to DSS interface.
Best,
Kirill
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We also tried to reproduce the problem with the current version of mkl 2020.1. Win 10, LP64 mode. OpenMP threading.
2>test_pardiso.exe
Thread: 0
create problem, factor and solve
Solution Array: -326.333 983.000 163.417 398.000 61.500
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for lookin at the problem again. I use INTEL ifort 11.0 and I guess the MKL Version that comes with it. But we also have ifort 17 and the strange thing is, that now this example I sent you my tutor at university got It run without error on our server with ifort 11.0. My compiler options for compiling my program with abaqus are:
compile_fortran = [fortCmd, '-V', '-c', '-fPIC', '-auto', '-mP2OPT_hpo_vec_divbyzero=F', '-extend_source', '-WB', '-I%I','-w', '-O3', '-openmp', '-w90', '-w95','-I/app/intel/Compiler/11.0/083/mkl/include']
I'm really not 100% sure what abaqus does under the hood. I now read that it uses MPI and not openMP as default. Could that be a problem?
And no OOC was only testet by me in this example, because I was not sure if maybe there lay the problem (which is obviously not the case).
Best,
Nils
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nils, thanks for the update. I am not really sure to understand what do you mean by abaqus in that case? Did you link this example against mkl?
Could you show the linking line and we will check then how this case it'll work with the current version!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay maybe I confused everything a little bit. The reproducer I sent was very abstract from my actual code that works with the commercial FE software abaqus. And I found out that it seems like "Abaqus" calls my program in a MPI environment instead of the openMP environment. So the example I send maybe isn't even representative for my problem. But you asked for the linking/compiling:
ifort -w -I/app/intel/Compiler/11.0/083/mkl/include test.f90 -L"/app/intel/Compiler/11.0/083/mkl/lib/em64t" "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_solver_lp64.a "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_intel_lp64.a -Wl,--start-group "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_intel_thread.a "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_core.a -Wl,--end-group -L"/app/intel/Compiler/11.0/083/mkl/lib/em64t" -liomp5 -lpthread -openmp -lm -o test.out
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am the academic supervisor of Nils Lange. I have ran the above reproducer on another machine (CPU Intel Xeon Gold 6248 CPU, CentOS Linux release 7.6.1810 with Kernel 3.10.0-957.21.3.el7.x86_64) with a more recent version of the compiler (ifort version 19.1.0.166) and I still get the same error. In particular, I compiled the code with
ifort test.f90 -mkl -qopenmp
When I run the resulting executable, I get the following output and error messages:
Thread: 0
create problem, factor and solve
OMP: Info #273: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array: -326.333 983.000 163.417 398.000 61.500
Thread: 1
only solve, do not factor matrix
OMP: Info #273: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
forrtl: severe (154): array index out of bounds
Image PC Routine Line Source
a.out 00000000004065CB Unknown Unknown Unknown
libpthread-2.17.s 00002B88465E75D0 Unknown Unknown Unknown
libmkl_avx512.so 00002B884F1606DE mkl_spblas_lp64_a Unknown Unknown
libmkl_intel_thre 00002B883FD905FC mkl_spblas_lp64_d Unknown Unknown
libmkl_intel_thre 00002B883FB18EE1 mkl_spblas_lp64_m Unknown Unknown
libmkl_intel_thre 00002B8840D47A7C mkl_pds_lp64_amux Unknown Unknown
libmkl_core.so 00002B884385AA6C mkl_pds_lp64_do_a Unknown Unknown
libmkl_core.so 00002B884341F702 mkl_pds_lp64_pard Unknown Unknown
libmkl_core.so 00002B88438701F4 mkl_pds_lp64_pard Unknown Unknown
libmkl_core.so 00002B8843590B88 mkl_pds_lp64_dss_ Unknown Unknown
a.out 0000000000405315 Unknown Unknown Unknown
libiomp5.so 00002B8846008713 __kmp_invoke_micr Unknown Unknown
libiomp5.so 00002B8845F96FFF Unknown Unknown Unknown
libiomp5.so 00002B8845F9605A Unknown Unknown Unknown
libiomp5.so 00002B8846008BD8 Unknown Unknown Unknown
libpthread-2.17.s 00002B88465DFDD5 Unknown Unknown Unknown
libc-2.17.so 00002B88468F202D clone Unknown Unknown
Could it be that the error appears only with the Linux version of the compiler? Remarkably, the program runs successfully on our Linux machines if the code of the routines "factor_and_solve" and "only_solve" is inserted directly into the main program. Apparently, the problem seems to have something to do with the way the subroutines are parallelized in Linux.
We are looking forward to receive further ideas how to localize and to overcome the problem.
Kind regards,
Geralf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>We don't expect the linux os or compiler versions will impact on the behavior...
>Checking the problem with the latest (current) version of mkl:
$ ifort -qopenmp -mkl test_pardiso.f90
$ ./a.out
Thread: 0
create problem, factor and solve
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array: -326.333 983.000 163.417 398.000 61.500
Thread: 1
only solve, do not factor matrix
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array: -979.000 2949.000 490.250 1194.000 184.500
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also checked the problem on different machines including Broadwell, Skylake...
e.x:
$ lscpu | grep Model
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Compiler version I used:
$ ifort --version
ifort (IFORT) 19.1.1.217 20200306
Copyright (C) 1985-2020 Intel Corporation. All rights reserved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the version you used shows the problem:
source /opt/intel/compilers_and_libraries_2017/linux/bin/compilervars.sh intel64
ifort --version
ifort (IFORT) 17.0.6 20171215
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.
./a.out
Thread: 0
create problem, factor and solve
Solution Array: -326.333 983.000 163.417 398.000 61.500
Thread: 1
only solve, do not factor matrix
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 0000000000404204 Unknown Unknown Unknown
libpthread-2.17.s 00002B07BDA8B5F0 Unknown Unknown Unknown
libmkl_avx2.so 00002B07E29347BF mkl_spblas_lp64_a Unknown Unknown
libmkl_intel_thre 00002B07B98B212C mkl_spblas_lp64_d Unknown Unknown
libmkl_intel_thre 00002B07B963A831 mkl_spblas_lp64_m Unknown Unknown
libmkl_intel_thre 00002B07BA5B30EA mkl_pds_lp64_amux Unknown Unknown
libmkl_core.so 00002B07BC68D823 mkl_pds_lp64_do_a Unknown Unknown
libmkl_core.so 00002B07BC49E94D mkl_pds_lp64_pard Unknown Unknown
libmkl_core.so 00002B07BC6A3422 mkl_pds_lp64_pard Unknown Unknown
libmkl_core.so 00002B07BC5F1FE2 mkl_pds_lp64_dss_ Unknown Unknown
a.out 0000000000403BC8 Unknown Unknown Unknown
libiomp5.so 00002B07BD48BD43 __kmp_invoke_micr Unknown Unknown
libiomp5.so 00002B07BD45A317 Unknown Unknown Unknown
libiomp5.so 00002B07BD459995 Unknown Unknown Unknown
libiomp5.so 00002B07BD48C1B4 Unknown Unknown Unknown
libpthread-2.17.s 00002B07BDA83E65 Unknown Unknown Unknown
libc-2.17.so 00002B07BDD9688D clone Unknown Unknown
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the next two versions of mkl 2018 and 2019 also reported the same faults, but the current ( latest) works:
source /opt/intel/compilers_and_libraries_2020/linux/bin/compilervars.sh intel64
ifort --version
ifort (IFORT) 19.1.1.217 20200306
Copyright (C) 1985-2020 Intel Corporation. All rights reserved.
$ ./a.out
Thread: 0
create problem, factor and solve
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array: -326.333 983.000 163.417 398.000 61.500
Thread: 1
only solve, do not factor matrix
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array: -979.000 2949.000 490.250 1194.000 184.500
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We tried the very latest version ifort 19.1.2.254 20200623, but we still receive the same error message as with the previous version 19.1.0.166. Please find attached our executable, if you want to check any relations to the software environment.
Anyway, your attempt from 06-19-2020 ran indeed without any error messages, but the result is wrong!!! The result in both threads should be the same. But for some reason, your result in thread 1 amounts to three times the result in thread 0. For us as applicants, getting a wrong result is even more severe than getting an error message.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you are right, I see the problem with v.2020 u2 on win and lin OS both. You could try to use MKL Pardiso instead of DSS API as already Kirill suggested above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
so, our recommendations to try the latest versions or to try using the Pardiso API instead of DSS as Kirill recommended above.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page