Pardiso hangs in phase 33 when called from an OMP critical region in IVF 16.0

Michael_R_7 · ‎09-22-2015

Hi all,

I suspect this is a problem with my code, but I can't figure out what the issue is...

I've been using similar code for a while now, and its worked fine until I installed 16.0, however it now hangs in the call to Pardiso with phase 33. Basically I import a matrix, factorize then solve in an OMP parallel loop for many RHS. Because of the layout of my main program (I provide a much stripped down version here, as well as a data file), the same temporary array is used by each thread, and the solution vector overwrites the right hand side. I'm unsure if Pardiso actually uses this temporary array - but in case it does I place the call to Pardiso within an OMP critical section to prevent multiple threads accessing it at the same time.

I compile with parallel OMP (/Qopenmp) and parallel MKL (/Qmkl:parallel) flags, otherwise everything is as Visual Studio sets it by default.

If OMP is set to one thread it runs fine. If I comment out the OMP parallel directive on the loop it also seems to run fine. However on my 8 (4 with hyper threading) core machine when running in parallel no joy.

If anyone has any suggestions/thoughts/solutions it would be appreciated,

Thanks,

Michael

include 'mkl_pardiso.f90'
    program TestPardiso2
    use OMP_LIB
    use MKL_PARDISO
    
    implicit none
    
    double precision, allocatable :: A(:), toSolve(:,:)
    integer, allocatable :: ia(:), ja(:)
    integer :: N
    
    integer, parameter :: numberToSolve = 20
    
    ! Read matrix from file
    call GetMatrixFromFile("Output.txt", N, ia, ja, A)
    
    ! Invent some data to solve
    allocate(toSolve(N,numberToSolve))
    call RANDOM_NUMBER(toSolve)
    
    ! Solve using Pardiso
    call SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
    
    contains 
    
    subroutine GetMatrixFromFile(name, N, ia, ja, A)
        character(len=*), intent(in) :: name
        double precision, intent(out), allocatable :: A(:)
        integer, intent(out), allocatable :: ia(:), ja(:)
        integer, intent(out) :: N
        
        character(len=255) :: buffer
        integer :: nnz, status, iVal, prevIval, cnt
        
        ! open file
        open(UNIT=21, FILE=name, STATUS="OLD", IOSTAT=status)
        
        ! Get matrix size from first lines of file
        read(UNIT=21, FMT='(A)') buffer ! Ignore title
        read(UNIT=21, FMT='(A5, I)') buffer, N
        read(UNIT=21, FMT='(A7, I)') buffer, nnz
        read(UNIT=21, FMT='(A)') buffer ! Ignore column heading
        read(UNIT=21, FMT='(A)') buffer ! Ignore space

        ! Allocate matrix - assume no errors!
        allocate(ia(N + 1), ja(nnz), A(nnz))
                
        ! Loop thru file. Assume only end of file error will occur
        status = 0
        cnt = 0
        prevIval = 0
        do while (status == 0)
            cnt = cnt + 1
            read(UNIT=21, FMT='(X, I, X, I, X, E)', IOSTAT=status) ival, ja(cnt), A(cnt)
            
            if ( ival /= prevIval ) then
                ia(ival) = cnt
                prevIval = ival
            end if          
        end do
        
        ia(N + 1) = nnz + 1
                    
    end subroutine GetMatrixFromFile
    
    
    subroutine SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
        integer, intent(in) :: N, numberToSolve
        integer, intent(in) :: ia(:), ja(:)
        double precision, intent(in) :: A(:)
    
        double precision, intent(inout) :: toSolve(:,:)
        
        type(MKL_PARDISO_HANDLE) :: pt(64)  
        integer :: param(64), perm(N), error, i
        double precision:: tmpArray(N)
        
        ! Initialize Pardiso options
        CALL pardisoinit(pt, 2, param)
        param( 6) = 1 ! Solver stores the solution in the right-hand side b.
        param(27) = 1 ! Check input data
        
        ! call omp_set_num_threads(1) ! Uncommenting this line Works
        
        ! Solve
        call pardiso(pt, 1, 1, 2, 12, N, A, ia, ja, perm, 1, param, 1, toSolve(:, 1), tmpArray, error)
        
        !$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i)
        DO i = 1, numberToSolve
            WRITE(*,*) (omp_get_thread_num() + 1), " : at critical"
            !$OMP CRITICAL (criticalPardisoSection2109)
            WRITE(*,*) (omp_get_thread_num() + 1), " : solving"
            
            ! Solve
            call pardiso(pt, 1, 1, 2, 33, N, A, ia, ja, perm, 1, param, 1, toSolve(:, i), tmpArray, error)
            
            WRITE(*,*) (omp_get_thread_num() + 1), " : complete"
            !$OMP END CRITICAL (criticalPardisoSection2109)
        END DO
        !$OMP END PARALLEL DO
  
    end subroutine SolveWithPardiso
        
    end program TestPardiso2

Gennady_F_Intel · ‎10-10-2015

Michael, I couldn't reproduce the issue. my environment -- win8, 64 bit, threading linking, 4 threads. MKL version 11.3.0. the log file is attached.

Gennady_F_Intel · ‎10-10-2015

Michael, sorry something wrong with attachment. please see below the some part of this log below.

all 20 iterations were completed successfully:

Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications
1 : at critical
1 : solving, num of iteration == 1
1 : complete
1 : at critical
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

..................................

.................................

solving, num of iteration == 20
1 : complete
==== PASSED ====
number of non-zeros in U: 1
number of non-zeros in L+U: 1078366
gflop for the numerical factorization: 0.189263

gflop/s for the numerical factorization: 9.107600

=== PARDISO: solving a symmetric positive definite system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve) : 0.003051 s
Time spent in additional calculations : 0.000647 s
Total time spent : 0.003698 s

Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP

< Linear system Ax = b >
number of equations: 14190
number of non-zeros in A: 167808
number of non-zeros in A (%): 0.083339

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 3594
size of largest supernode: 370
number of non-zeros in L: 1078365
number of non-zeros in U: 1
number of non-zeros in L+U: 1078366
gflop for the numerical factorization: 0.189263

gflop/s for the numerical factorization: 9.107600

=== PARDISO: solving a symmetric positive definite system ===

Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve) : 0.003532 s
Time spent in additional calculations : 0.000626 s
Total time spent : 0.004158 s

Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP

< Linear system Ax = b >
number of equations: 14190
number of non-zeros in A: 167808
number of non-zeros in A (%): 0.083339

number of right-hand sides: 1

< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 3594
size of largest supernode: 370
number of non-zeros in L: 1078365
number of non-zeros in U: 1
number of non-zeros in L+U: 1078366
gflop for the numerical factorization: 0.189263

gflop/s for the numerical factorization: 9.107600

Michael_R_7 · ‎10-12-2015

Hi Gennady,

Thanks for taking the time to look into this. Can you confirm OMP directives are being honoured (compiling with /Qmkl:parallel)? I would expect to see output from other threads being blocked at the critical section; something like:

1 : at critical

2 : at critical

3 : at critical

In your output I only see thread 1?

Thanks again,

Michael

Gennady_F_Intel · ‎10-13-2015

Michael, I used 24 threads system, RH7.0, MKL 11.3. I checked with the 11.3 version of MKL and I didn't modify your code except printing the # of iteration --- smth like the follow: : solving, num of iteration == 18

all log file is attached.