Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Pardiso hangs in phase 33 when called from an OMP critical region in IVF 16.0

Michael_R_7
Beginner
375 Views

Hi all,

I suspect this is a problem with my code, but I can't figure out what the issue is...

I've been using similar code for a while now, and its worked fine until I installed 16.0, however it now hangs in the call to Pardiso with phase 33. Basically I import a matrix, factorize then solve in an OMP parallel loop for many RHS. Because of the layout of my main program (I provide a much stripped down version here, as well as a data file), the same temporary array is used by each thread, and the solution vector overwrites the right hand side. I'm unsure if Pardiso actually uses this temporary array - but in case it does I place the call to Pardiso within an OMP critical section to prevent multiple threads accessing it at the same time. 

I compile with parallel OMP (/Qopenmp) and parallel MKL (/Qmkl:parallel) flags, otherwise everything is as Visual Studio sets it by default.

If OMP is set to one thread it runs fine. If I comment out the OMP parallel directive on the loop it also seems to run fine. However on my 8 (4 with hyper threading) core machine when running in parallel no joy.

If anyone has any suggestions/thoughts/solutions it would be appreciated,

Thanks,

Michael

include 'mkl_pardiso.f90'
    program TestPardiso2
    use OMP_LIB
    use MKL_PARDISO
    
    implicit none
    
    double precision, allocatable :: A(:), toSolve(:,:)
    integer, allocatable :: ia(:), ja(:)
    integer :: N
    
    integer, parameter :: numberToSolve = 20
    
    ! Read matrix from file
    call GetMatrixFromFile("Output.txt", N, ia, ja, A)
    
    ! Invent some data to solve
    allocate(toSolve(N,numberToSolve))
    call RANDOM_NUMBER(toSolve)
    
    ! Solve using Pardiso
    call SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
    
    contains 
    
    subroutine GetMatrixFromFile(name, N, ia, ja, A)
        character(len=*), intent(in) :: name
        double precision, intent(out), allocatable :: A(:)
        integer, intent(out), allocatable :: ia(:), ja(:)
        integer, intent(out) :: N
        
        character(len=255) :: buffer
        integer :: nnz, status, iVal, prevIval, cnt
        
        ! open file
        open(UNIT=21, FILE=name, STATUS="OLD", IOSTAT=status)
        
        ! Get matrix size from first lines of file
        read(UNIT=21, FMT='(A)') buffer ! Ignore title
        read(UNIT=21, FMT='(A5, I)') buffer, N
        read(UNIT=21, FMT='(A7, I)') buffer, nnz
        read(UNIT=21, FMT='(A)') buffer ! Ignore column heading
        read(UNIT=21, FMT='(A)') buffer ! Ignore space

        ! Allocate matrix - assume no errors!
        allocate(ia(N + 1), ja(nnz), A(nnz))
                
        ! Loop thru file. Assume only end of file error will occur
        status = 0
        cnt = 0
        prevIval = 0
        do while (status == 0)
            cnt = cnt + 1
            read(UNIT=21, FMT='(X, I, X, I, X, E)', IOSTAT=status) ival, ja(cnt), A(cnt)
            
            if ( ival /= prevIval ) then
                ia(ival) = cnt
                prevIval = ival
            end if          
        end do
        
        ia(N + 1) = nnz + 1
                    
    end subroutine GetMatrixFromFile
    
    
    subroutine SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
        integer, intent(in) :: N, numberToSolve
        integer, intent(in) :: ia(:), ja(:)
        double precision, intent(in) :: A(:)
    
        double precision, intent(inout) :: toSolve(:,:)
        
        type(MKL_PARDISO_HANDLE) :: pt(64)  
        integer :: param(64), perm(N), error, i
        double precision:: tmpArray(N)
        
        ! Initialize Pardiso options
        CALL pardisoinit(pt, 2, param)
        param( 6) = 1 ! Solver stores the solution in the right-hand side b.
        param(27) = 1 ! Check input data
        
        ! call omp_set_num_threads(1) ! Uncommenting this line Works
        
        ! Solve
        call pardiso(pt, 1, 1, 2, 12, N, A, ia, ja, perm, 1, param, 1, toSolve(:, 1), tmpArray, error)
        
        !$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i)
        DO i = 1, numberToSolve
            WRITE(*,*) (omp_get_thread_num() + 1), " : at critical"
            !$OMP CRITICAL (criticalPardisoSection2109)
            WRITE(*,*) (omp_get_thread_num() + 1), " : solving"
            
            ! Solve
            call pardiso(pt, 1, 1, 2, 33, N, A, ia, ja, perm, 1, param, 1, toSolve(:, i), tmpArray, error)
            
            WRITE(*,*) (omp_get_thread_num() + 1), " : complete"
            !$OMP END CRITICAL (criticalPardisoSection2109)
        END DO
        !$OMP END PARALLEL DO
  
    end subroutine SolveWithPardiso
        
    end program TestPardiso2

 

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
375 Views

Michael, I couldn't reproduce the issue. my environment -- win8, 64 bit, threading linking, 4 threads. MKL version 11.3.0.  the log file is attached.

 

0 Kudos
Gennady_F_Intel
Moderator
375 Views

Michael, sorry something wrong with attachment. please see below the some part of this log below.

all 20 iterations were completed successfully:

Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications                                                                                          
           1  : at critical
           1  : solving, num of iteration ==            1
           1  : complete
           1  : at critical
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

..................................

.................................

solving, num of iteration ==           20
           1  : complete
  ==== PASSED ==== 
    number of non-zeros in U:                1
             number of non-zeros in L+U:              1078366
             gflop   for the numerical factorization: 0.189263

             gflop/s for the numerical factorization: 9.107600


=== PARDISO: solving a symmetric positive definite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.003051 s
Time spent in additional calculations                            : 0.000647 s
Total time spent                                                 : 0.003698 s

Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP

< Linear system Ax = b >
             number of equations:           14190
             number of non-zeros in A:      167808
             number of non-zeros in A (%): 0.083339

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 80
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    3594
             size of largest supernode:               370
             number of non-zeros in L:                1078365
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1078366
             gflop   for the numerical factorization: 0.189263

             gflop/s for the numerical factorization: 9.107600


=== PARDISO: solving a symmetric positive definite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.003532 s
Time spent in additional calculations                            : 0.000626 s
Total time spent                                                 : 0.004158 s

Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP

< Linear system Ax = b >
             number of equations:           14190
             number of non-zeros in A:      167808
             number of non-zeros in A (%): 0.083339

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 80
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    3594
             size of largest supernode:               370
             number of non-zeros in L:                1078365
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1078366
             gflop   for the numerical factorization: 0.189263

             gflop/s for the numerical factorization: 9.107600

 

 

0 Kudos
Michael_R_7
Beginner
375 Views

Hi Gennady,

Thanks for taking the time to look into this. Can you confirm OMP directives are being honoured (compiling with /Qmkl:parallel)? I would expect to see output from other threads being blocked at the critical section; something like:

1  : at critical

2 : at critical

3 : at critical 

In your output I only see thread 1?

Thanks again,

Michael 

0 Kudos
Gennady_F_Intel
Moderator
375 Views

Michael, I used 24 threads system, RH7.0, MKL 11.3. I checked with the 11.3 version of MKL and I didn't modify your code except printing the # of iteration --- smth like the follow: : solving, num of iteration ==           18

all log file is attached.

0 Kudos
Reply