Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
301 Views

MKL DSS solver problem in multithread application

Hello to all, I'm new to this forum and I got a question on the MKL dss.

I'm currently working on a finite element program I wrote by myself. Every time the program is called, there are two systems of equations to be solved. The system solved at first ist the same system solved in the last iteration increment at the end. So the idea is to store the factorized matrix and not delete it at the

end of the program. This needs a lot of memory, but is way faster. My program is already working with one thread, but I like to performe multithread computations. For solvinig I use the Intel MKL DSS solver. My ifort Version is 11.0 (which is very old I know).

Now the Problem is, when the Program runs with multiple threads, it produces an illegal memory reference type error the first time the not deleted factorized matrix is accessed again. Do you have an idea where this could come frome? I thought Intel MKL would be thread safe? It should be said, that it is possible that one thread factorizes the matrix and then another thread uses this stored matrix (in theory, but not working this way). Do you have an idea?

best regards

Nils Lange

0 Kudos
18 Replies
Highlighted
Employee
301 Views

Hello Nils,

You've described a complicated workflow. Ideally, for fast investigation, we'd like to get a reproducer which has a similar workflow and (not necesssarily but) maybe a toy matrix (say, from one of our example) for simplicity. 

General comments:
1) I am not sure I got your threading model from the description. Again, an example or at least a pseudocode would help.
2) I do not recommend using DSS interface, I always suggest using PARDISO. DSS interface has limited scope which sometimes may lead to confusion and is not any simpler in my opinion than regular PARDISO. Avoid using it as much as you can.
3) I would like to assure you that MKL is thread safe in your case but since you observe an issue, I'd better double check.

Best,
Kirill

0 Kudos
Highlighted
Beginner
301 Views

Hello Kirill,

thanks for the fast reply. The problem is, my program has already around 2000 lines of code, then maybe more 1500 lines of code I only took from my institute, maybe 1500 lines of code in Python Preprocessing and then my program is called from abaqus, a commercial FE Program. I think its not possible to bring it to a small pseudocode, I'm sorry.

But here whats working: I can run Abaqus in multithread mode and then in each thread (each thread computation is compleatly separated) I go through the hole process (dss_create...dss_factor_real...DSS_DELETE) and its working. Whats also working is that in a single thread computation I can save the factorized matrix by NOT calling DSS_DELETE at the end, and when my programm is called again from abaqus I do handle%dummy=pointer (where pointer is the Integer Number I saved) and it's working (lets say I have 1000 not deleted matrixes and I save all the pointers).

But this method is not working in a multithread computation. It can be assured from my side, that the saved and not deleted factorized matrix is only accessed from one thread at a time. Still I get a memory reference error.

Best regards

Nils Lange

0 Kudos
Highlighted
Employee
301 Views

Hello Nils,

I understand that. Based on your description, I imagine the following pseudocode:

#pragma omp parallel num_threads(2)
{
     myid = thread_index // 0 or 1
    // setting up parameters for PARDISO (or DSS)
    if (myid == 0)
        pardiso(phase=12); // reordering and factorization
    omp_barrier  
    if (myid == 1)
        pardiso(phase=33); // solve, causes seg fault?
}

Is this correct?

Best,
Kirill

0 Kudos
Highlighted
Beginner
301 Views

Hello Kirill,

I'm working in Fortran and it looks like C++ but still I think thats not exactly what I'm doing. My programm is executed in multiple threads completely seperated from each other. A Pseudocode would maybe look like this:

(executed in multiple threads but with DIFFERENT ID's)

 

program pseudocode

....

if (beginning of analysis) then

error=dss_create(handle,...)

pointer_array(ID)=handle%dummy

end if

if (middle of analysis) then

handle%dummy=pointer_array(ID)

end if

error=dss_factor_real(handle,...)

if (end of program) then

error=DSS_DELETE(handle)

end if

......

end program pseudocode

 

I hope you can see my point, it's really a bit complex and not that easy to explain.

 

Best regards

Nils Lange

0 Kudos
Highlighted
Beginner
301 Views

I created a working code with a minimalistic example, that simulates my problem. I'm solving the standard DSS example problem in one thread by factorizing the matrix and all that stuff, then in another thread use that factorized matrix to only solve the problem. Then the program crashes. I'm using openMP. Without using openMP and only one thread the Code works and produces no error. Maybe you can help me now better.

INCLUDE 'mkl_dss.f90' ! Include the standard DSS "header file."

program bsp
  use omp_lib
  use mkl_dss
  implicit none
  
  integer :: tnr
  INTEGER :: i
  INTEGER:: handle_number
  LOGICAL::solve
  
  call omp_set_num_threads( 2 )
  
  
	solve=.TRUE.
  !$omp parallel private( i )
    !$omp do
      do i = 1, 4
        tnr = omp_get_thread_num()  ! Threadnumber
		if (tnr==0 .AND. solve) then
		
		WRITE(*,*) 'Thread:',omp_get_thread_num()
		WRITE(*,*) 'create problem, factor and solve'
		
		CALL factor_and_solve(handle_number)
		
		solve=.FALSE.
        
		end if
      end do
    !$omp end do
  !$omp end parallel

  call omp_set_num_threads( 2 )
  
	solve=.TRUE.
    !$omp parallel private( i )
    !$omp do
      do i = 1, 4
        tnr = omp_get_thread_num()  ! Threadnumber
		if (tnr==1 .AND. solve) then
		
		solve=.FALSE.
		WRITE(*,*)'Thread:', omp_get_thread_num()
		WRITE(*,*)'only solve, do not factor matrix'

		CALL only_solve(handle_number)
		
		end if
      end do
    !$omp end do
  !$omp end parallel
      
end program bsp

SUBROUTINE factor_and_solve(handle_number)

use mkl_dss

IMPLICIT NONE

    INTEGER, PARAMETER :: dp = KIND(1.0D0)
    INTEGER :: error
    INTEGER :: j
    INTEGER :: columns(9)
    INTEGER :: nCols
    INTEGER :: nNonZeros
    INTEGER :: nRhs
    INTEGER :: nRows
    REAL(KIND=DP) :: rhs(5)
    INTEGER :: rowIndex(6)
    REAL(KIND=DP) :: solution(5)
    REAL(KIND=DP) :: values(9)
    TYPE(MKL_DSS_HANDLE) :: handle ! Allocate storage for the solver handle.
    INTEGER::handle_number !stores handle number

		
		! Set the problem to be solved.
        nRows = 5
        nCols = 5
        nNonZeros = 9
        nRhs = 1
        rowIndex = (/ 1, 6, 7, 8, 9, 10 /)
        columns = (/ 1, 2, 3, 4, 5, 2, 3, 4, 5 /)
        values = (/ 9.0_DP, 1.5_DP, 6.0_DP, 0.75_DP, 3.0_DP, 0.5_DP, 12.0_DP, &
        & 0.625_DP, 16.0_DP /)
        rhs = (/ 1.0_DP, 2.0_DP, 3.0_DP, 4.0_DP, 5.0_DP /)
        ! Initialize the solver.        
        error = dss_create( handle, MKL_DSS_MSG_LVL_WARNING + MKL_DSS_TERM_LVL_ERROR + MKL_DSS_OOC_STRONG )
        ! Define the non-zero structure of the matrix.
        error = dss_define_structure( handle, MKL_DSS_SYMMETRIC, rowIndex, nRows, &
        & nCols, columns, nNonZeros )
        ! Reorder the matrix.
        error = dss_reorder( handle, MKL_DSS_DEFAULTS, [0] )
        ! Factor the matrix.
        error = dss_factor_real( handle, MKL_DSS_DEFAULTS, values )
        ! Allocate the solution vector and solve the problem.
        error = dss_solve_real(handle, MKL_DSS_DEFAULTS, rhs, nRhs, solution )
        
        ! Print the solution vector
        WRITE(*,"('Solution Array: '(5F10.3))") ( solution(j), j = 1, nCols )
        
        handle_number=handle%dummy

END SUBROUTINE factor_and_solve

SUBROUTINE only_solve(handle_number)

	use mkl_dss

	IMPLICIT NONE
	
	TYPE(MKL_DSS_HANDLE) :: handle ! Allocate storage for the solver handle.
	INTEGER, PARAMETER :: dp = KIND(1.0D0)
    INTEGER :: error
    INTEGER :: j
	REAL(KIND=DP) :: rhs(5)
	REAL(KIND=DP) :: solution(5)
	INTEGER :: nRhs
	INTEGER :: nCols
	INTEGER:: handle_number !stores handle pointer
	
	   handle%dummy=handle_number

	   nCols = 5
	   nRhs = 1
	   rhs = (/ 1.0_DP, 2.0_DP, 3.0_DP, 4.0_DP, 5.0_DP /)

	   error = dss_solve_real(handle, MKL_DSS_DEFAULTS, rhs, nRhs, solution )

	   ! Print the solution vector
       WRITE(*,"('Solution Array: '(5F10.3))") ( solution(j), j = 1, nCols )

END SUBROUTINE only_solve

 

0 Kudos
Highlighted
Employee
301 Views

Hello again,

Thanks for creating a reproducer! Unfortunately, I cannot reproduce the failure, it all work just fine with the versions of MKL I tried. So I suggest trying a newer MKL version (maybe we had a problem long ago but fixed it) if possible. Also, could you tell me what is the exact version of MKL you're using and how you link your application with MKL?

Also, you're using the OOC (Out-of-Core) mode of PARDISO. Is this intended? If not, re-consider the idea of using DSS API. You enabled OOC by having the flag MKL_DSS_OOC_STRONG. But this feature only makes sense in a quite rare case when the factors cannot fit into the RAM available. And the price for overcoming this limitation with OOC is performance. See more details here: https://software.intel.com/content/www/us/en/develop/articles/how-to-use-ooc-pardiso.html

So, in a "normal" situation, you don't need OOC. Also, with PARDISO (main API) you can use more advanced features that were not ported to DSS interface.

Best,
Kirill

0 Kudos
Highlighted
Moderator
301 Views

We also tried to reproduce the problem with the current version of mkl 2020.1. Win 10, LP64 mode. OpenMP threading.

2>test_pardiso.exe
 Thread:           0
 create problem, factor and solve
Solution Array:   -326.333   983.000   163.417   398.000    61.500

0 Kudos
Highlighted
Beginner
301 Views

Many thanks for lookin at the problem again. I use INTEL ifort 11.0 and I guess the MKL Version that comes with it. But we also have ifort 17 and the strange thing is, that now this example I sent you my tutor at university got It run without error on our server with ifort 11.0. My compiler options for compiling my program with abaqus are:

compile_fortran = [fortCmd, '-V', '-c', '-fPIC', '-auto', '-mP2OPT_hpo_vec_divbyzero=F', '-extend_source', '-WB', '-I%I','-w', '-O3', '-openmp', '-w90', '-w95','-I/app/intel/Compiler/11.0/083/mkl/include']

I'm really not 100% sure what abaqus does under the hood. I now read that it uses MPI and not openMP as default. Could that be a problem?

And no OOC was only testet by me in this example, because I was not sure if maybe there lay the problem (which is obviously not the case).

Best,

Nils

0 Kudos
Highlighted
Moderator
301 Views

Nils, thanks for the update. I am not really sure to understand what do you mean by abaqus in that case?  Did you link this example against mkl?

Could you show the linking line and we will check then how this case it'll work with the current version!

 

0 Kudos
Highlighted
Beginner
301 Views

Okay maybe I confused everything a little bit. The reproducer I sent was very abstract from my actual code that works with the commercial FE software abaqus. And I found out that it seems like "Abaqus" calls my program in a MPI environment instead of the openMP environment. So the example I send maybe isn't even representative for my problem. But you asked for the linking/compiling:

ifort -w -I/app/intel/Compiler/11.0/083/mkl/include  test.f90  -L"/app/intel/Compiler/11.0/083/mkl/lib/em64t" "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_solver_lp64.a "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_intel_lp64.a -Wl,--start-group "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_intel_thread.a  "/app/intel/Compiler/11.0/083/mkl/lib/em64t"/libmkl_core.a -Wl,--end-group -L"/app/intel/Compiler/11.0/083/mkl/lib/em64t" -liomp5 -lpthread -openmp -lm -o test.out

0 Kudos
Highlighted
301 Views

I am the academic supervisor of Nils Lange. I have ran the above reproducer on another machine (CPU Intel Xeon Gold 6248 CPU, CentOS Linux release 7.6.1810 with Kernel 3.10.0-957.21.3.el7.x86_64) with a more recent version of the compiler (ifort version 19.1.0.166) and I still get the same error. In particular, I compiled the code with

  ifort test.f90 -mkl -qopenmp

When I run the resulting executable, I get the following output and error messages:

Thread:           0
 create problem, factor and solve
OMP: Info #273: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array:   -326.333   983.000   163.417   398.000    61.500
 Thread:           1
 only solve, do not factor matrix
OMP: Info #273: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
forrtl: severe (154): array index out of bounds
Image              PC                Routine            Line        Source
a.out              00000000004065CB  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B88465E75D0  Unknown               Unknown  Unknown
libmkl_avx512.so   00002B884F1606DE  mkl_spblas_lp64_a     Unknown  Unknown
libmkl_intel_thre  00002B883FD905FC  mkl_spblas_lp64_d     Unknown  Unknown
libmkl_intel_thre  00002B883FB18EE1  mkl_spblas_lp64_m     Unknown  Unknown
libmkl_intel_thre  00002B8840D47A7C  mkl_pds_lp64_amux     Unknown  Unknown
libmkl_core.so     00002B884385AA6C  mkl_pds_lp64_do_a     Unknown  Unknown
libmkl_core.so     00002B884341F702  mkl_pds_lp64_pard     Unknown  Unknown
libmkl_core.so     00002B88438701F4  mkl_pds_lp64_pard     Unknown  Unknown
libmkl_core.so     00002B8843590B88  mkl_pds_lp64_dss_     Unknown  Unknown
a.out              0000000000405315  Unknown               Unknown  Unknown
libiomp5.so        00002B8846008713  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        00002B8845F96FFF  Unknown               Unknown  Unknown
libiomp5.so        00002B8845F9605A  Unknown               Unknown  Unknown
libiomp5.so        00002B8846008BD8  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B88465DFDD5  Unknown               Unknown  Unknown
libc-2.17.so       00002B88468F202D  clone                 Unknown  Unknown

Could it be that the error appears only with the Linux version of the compiler? Remarkably, the program runs successfully on our Linux machines if the code of the routines "factor_and_solve" and "only_solve" is inserted directly into the main program. Apparently, the problem seems to have something to do with the way the subroutines are parallelized in Linux.

We are looking forward to receive further ideas how to localize and to overcome the problem.

Kind regards,

Geralf

0 Kudos
Highlighted
Moderator
301 Views

>We don't expect the linux os or compiler versions will impact on the behavior...

>Checking the problem with the latest (current) version of mkl:

$ ifort -qopenmp -mkl test_pardiso.f90
$ ./a.out
 Thread:           0
 create problem, factor and solve
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array:   -326.333   983.000   163.417   398.000    61.500
 Thread:           1
 only solve, do not factor matrix
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array:   -979.000  2949.000   490.250  1194.000   184.500
 

Thanks

 

 

 

0 Kudos
Highlighted
Moderator
301 Views

I also checked the problem on different machines including Broadwell, Skylake...

e.x:

$ lscpu | grep Model
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
 

Compiler version I used:

$ ifort --version
ifort (IFORT) 19.1.1.217 20200306
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

0 Kudos
Highlighted
Moderator
301 Views

the version you used shows the problem:

source /opt/intel/compilers_and_libraries_2017/linux/bin/compilervars.sh intel64

 ifort --version
ifort (IFORT) 17.0.6 20171215
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.
./a.out
 Thread:           0
 create problem, factor and solve
Solution Array:   -326.333   983.000   163.417   398.000    61.500
 Thread:           1
 only solve, do not factor matrix
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
a.out              0000000000404204  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B07BDA8B5F0  Unknown               Unknown  Unknown
libmkl_avx2.so     00002B07E29347BF  mkl_spblas_lp64_a     Unknown  Unknown
libmkl_intel_thre  00002B07B98B212C  mkl_spblas_lp64_d     Unknown  Unknown
libmkl_intel_thre  00002B07B963A831  mkl_spblas_lp64_m     Unknown  Unknown
libmkl_intel_thre  00002B07BA5B30EA  mkl_pds_lp64_amux     Unknown  Unknown
libmkl_core.so     00002B07BC68D823  mkl_pds_lp64_do_a     Unknown  Unknown
libmkl_core.so     00002B07BC49E94D  mkl_pds_lp64_pard     Unknown  Unknown
libmkl_core.so     00002B07BC6A3422  mkl_pds_lp64_pard     Unknown  Unknown
libmkl_core.so     00002B07BC5F1FE2  mkl_pds_lp64_dss_     Unknown  Unknown
a.out              0000000000403BC8  Unknown               Unknown  Unknown
libiomp5.so        00002B07BD48BD43  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        00002B07BD45A317  Unknown               Unknown  Unknown
libiomp5.so        00002B07BD459995  Unknown               Unknown  Unknown
libiomp5.so        00002B07BD48C1B4  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B07BDA83E65  Unknown               Unknown  Unknown
libc-2.17.so       00002B07BDD9688D  clone                 Unknown  Unknown
 

 

0 Kudos
Highlighted
Moderator
301 Views

the next two versions of mkl 2018 and 2019 also reported the same faults, but the current ( latest) works:

source /opt/intel/compilers_and_libraries_2020/linux/bin/compilervars.sh intel64

 ifort --version
ifort (IFORT) 19.1.1.217 20200306
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

$ ./a.out
 Thread:           0
 create problem, factor and solve
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array:   -326.333   983.000   163.417   398.000    61.500
 Thread:           1
 only solve, do not factor matrix
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
OMP: Info #274: omp_get_nested routine deprecated, please use omp_get_max_active_levels instead.
Solution Array:   -979.000  2949.000   490.250  1194.000   184.500

0 Kudos
Highlighted
Moderator
301 Views

so, our recommendations to try the latest versions or to try using the Pardiso API instead of DSS as Kirill recommended above.

0 Kudos
Highlighted
211 Views

We tried the very latest version ifort 19.1.2.254 20200623, but we still receive the same error message as with the previous version 19.1.0.166. Please find attached our executable, if you want to check any relations to the software environment.

Anyway, your attempt from 06-19-2020 ran indeed without any error messages, but the result is wrong!!! The result in both threads should be the same. But for some reason, your result in thread 1 amounts to three times the result in thread 0. For us as applicants, getting a wrong result is even more severe than getting an error message.

 

0 Kudos
Highlighted
Moderator
202 Views

Yes, you are right, I see the problem with v.2020 u2 on win and lin OS both. You could try to use MKL Pardiso instead of DSS API as already Kirill suggested above. 

0 Kudos