Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Pardiso hangs in phase 33 when called from an OMP critical region in IVF 16.0

Michael_R_7
Beginner
1,395 Views

Hi all,

I suspect this is a problem with my code, but I can't figure out what the issue is...

I've been using similar code for a while now, and its worked fine until I installed 16.0, however it now hangs in the call to Pardiso with phase 33. Basically I import a matrix, factorize then solve in an OMP parallel loop for many RHS. Because of the layout of my main program (I provide a much stripped down version here, as well as a data file), the same temporary array is used by each thread, and the solution vector overwrites the right hand side. I'm unsure if Pardiso actually uses this temporary array - but in case it does I place the call to Pardiso within an OMP critical section to prevent multiple threads accessing it at the same time. 

I compile with parallel OMP (/Qopenmp) and parallel MKL (/Qmkl:parallel) flags, otherwise everything is as Visual Studio sets it by default.

If OMP is set to one thread it runs fine. If I comment out the OMP parallel directive on the loop it also seems to run fine. However on my 8 (4 with hyper threading) core machine when running in parallel no joy.

If anyone has any suggestions/thoughts/solutions it would be appreciated,

Thanks,

Michael

include 'mkl_pardiso.f90'
    program TestPardiso2
    use OMP_LIB
    use MKL_PARDISO
    
    implicit none
    
    double precision, allocatable :: A(:), toSolve(:,:)
    integer, allocatable :: ia(:), ja(:)
    integer :: N
    
    integer, parameter :: numberToSolve = 20
    
    ! Read matrix from file
    call GetMatrixFromFile("Output.txt", N, ia, ja, A)
    
    ! Invent some data to solve
    allocate(toSolve(N,numberToSolve))
    call RANDOM_NUMBER(toSolve)
    
    ! Solve using Pardiso
    call SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
    
    contains 
    
    subroutine GetMatrixFromFile(name, N, ia, ja, A)
        character(len=*), intent(in) :: name
        double precision, intent(out), allocatable :: A(:)
        integer, intent(out), allocatable :: ia(:), ja(:)
        integer, intent(out) :: N
        
        character(len=255) :: buffer
        integer :: nnz, status, iVal, prevIval, cnt
        
        ! open file
        open(UNIT=21, FILE=name, STATUS="OLD", IOSTAT=status)
        
        ! Get matrix size from first lines of file
        read(UNIT=21, FMT='(A)') buffer ! Ignore title
        read(UNIT=21, FMT='(A5, I)') buffer, N
        read(UNIT=21, FMT='(A7, I)') buffer, nnz
        read(UNIT=21, FMT='(A)') buffer ! Ignore column heading
        read(UNIT=21, FMT='(A)') buffer ! Ignore space

        ! Allocate matrix - assume no errors!
        allocate(ia(N + 1), ja(nnz), A(nnz))
                
        ! Loop thru file. Assume only end of file error will occur
        status = 0
        cnt = 0
        prevIval = 0
        do while (status == 0)
            cnt = cnt + 1
            read(UNIT=21, FMT='(X, I, X, I, X, E)', IOSTAT=status) ival, ja(cnt), A(cnt)
            
            if ( ival /= prevIval ) then
                ia(ival) = cnt
                prevIval = ival
            end if          
        end do
        
        ia(N + 1) = nnz + 1
                    
    end subroutine GetMatrixFromFile
    
    
    subroutine SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
        integer, intent(in) :: N, numberToSolve
        integer, intent(in) :: ia(:), ja(:)
        double precision, intent(in) :: A(:)
    
        double precision, intent(inout) :: toSolve(:,:)
        
        type(MKL_PARDISO_HANDLE) :: pt(64)  
        integer :: param(64), perm(N), error, i
        double precision:: tmpArray(N)
        
        ! Initialize Pardiso options
        CALL pardisoinit(pt, 2, param)
        param( 6) = 1 ! Solver stores the solution in the right-hand side b.
        param(27) = 1 ! Check input data
        
        ! call omp_set_num_threads(1) ! Uncommenting this line Works
        
        ! Solve
        call pardiso(pt, 1, 1, 2, 12, N, A, ia, ja, perm, 1, param, 1, toSolve(:, 1), tmpArray, error)
        
        !$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i)
        DO i = 1, numberToSolve
            WRITE(*,*) (omp_get_thread_num() + 1), " : at critical"
            !$OMP CRITICAL (criticalPardisoSection2109)
            WRITE(*,*) (omp_get_thread_num() + 1), " : solving"
            
            ! Solve
            call pardiso(pt, 1, 1, 2, 33, N, A, ia, ja, perm, 1, param, 1, toSolve(:, i), tmpArray, error)
            
            WRITE(*,*) (omp_get_thread_num() + 1), " : complete"
            !$OMP END CRITICAL (criticalPardisoSection2109)
        END DO
        !$OMP END PARALLEL DO
  
    end subroutine SolveWithPardiso
        
    end program TestPardiso2

 

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
1,395 Views

Michael, I couldn't reproduce the issue. my environment -- win8, 64 bit, threading linking, 4 threads. MKL version 11.3.0.  the log file is attached.

 

0 Kudos
Gennady_F_Intel
Moderator
1,395 Views

Michael, sorry something wrong with attachment. please see below the some part of this log below.

all 20 iterations were completed successfully:

Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications                                                                                          
           1  : at critical
           1  : solving, num of iteration ==            1
           1  : complete
           1  : at critical
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===

..................................

.................................

solving, num of iteration ==           20
           1  : complete
  ==== PASSED ==== 
    number of non-zeros in U:                1
             number of non-zeros in L+U:              1078366
             gflop   for the numerical factorization: 0.189263

             gflop/s for the numerical factorization: 9.107600


=== PARDISO: solving a symmetric positive definite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.003051 s
Time spent in additional calculations                            : 0.000647 s
Total time spent                                                 : 0.003698 s

Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP

< Linear system Ax = b >
             number of equations:           14190
             number of non-zeros in A:      167808
             number of non-zeros in A (%): 0.083339

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 80
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    3594
             size of largest supernode:               370
             number of non-zeros in L:                1078365
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1078366
             gflop   for the numerical factorization: 0.189263

             gflop/s for the numerical factorization: 9.107600


=== PARDISO: solving a symmetric positive definite system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.003532 s
Time spent in additional calculations                            : 0.000626 s
Total time spent                                                 : 0.004158 s

Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP

< Linear system Ax = b >
             number of equations:           14190
             number of non-zeros in A:      167808
             number of non-zeros in A (%): 0.083339

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 80
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    3594
             size of largest supernode:               370
             number of non-zeros in L:                1078365
             number of non-zeros in U:                1
             number of non-zeros in L+U:              1078366
             gflop   for the numerical factorization: 0.189263

             gflop/s for the numerical factorization: 9.107600

 

 

0 Kudos
Michael_R_7
Beginner
1,395 Views

Hi Gennady,

Thanks for taking the time to look into this. Can you confirm OMP directives are being honoured (compiling with /Qmkl:parallel)? I would expect to see output from other threads being blocked at the critical section; something like:

1  : at critical

2 : at critical

3 : at critical 

In your output I only see thread 1?

Thanks again,

Michael 

0 Kudos
Gennady_F_Intel
Moderator
1,395 Views

Michael, I used 24 threads system, RH7.0, MKL 11.3. I checked with the 11.3 version of MKL and I didn't modify your code except printing the # of iteration --- smth like the follow: : solving, num of iteration ==           18

all log file is attached.

0 Kudos
Reply