- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I suspect this is a problem with my code, but I can't figure out what the issue is...
I've been using similar code for a while now, and its worked fine until I installed 16.0, however it now hangs in the call to Pardiso with phase 33. Basically I import a matrix, factorize then solve in an OMP parallel loop for many RHS. Because of the layout of my main program (I provide a much stripped down version here, as well as a data file), the same temporary array is used by each thread, and the solution vector overwrites the right hand side. I'm unsure if Pardiso actually uses this temporary array - but in case it does I place the call to Pardiso within an OMP critical section to prevent multiple threads accessing it at the same time.
I compile with parallel OMP (/Qopenmp) and parallel MKL (/Qmkl:parallel) flags, otherwise everything is as Visual Studio sets it by default.
If OMP is set to one thread it runs fine. If I comment out the OMP parallel directive on the loop it also seems to run fine. However on my 8 (4 with hyper threading) core machine when running in parallel no joy.
If anyone has any suggestions/thoughts/solutions it would be appreciated,
Thanks,
Michael
include 'mkl_pardiso.f90'
    program TestPardiso2
    use OMP_LIB
    use MKL_PARDISO
    
    implicit none
    
    double precision, allocatable :: A(:), toSolve(:,:)
    integer, allocatable :: ia(:), ja(:)
    integer :: N
    
    integer, parameter :: numberToSolve = 20
    
    ! Read matrix from file
    call GetMatrixFromFile("Output.txt", N, ia, ja, A)
    
    ! Invent some data to solve
    allocate(toSolve(N,numberToSolve))
    call RANDOM_NUMBER(toSolve)
    
    ! Solve using Pardiso
    call SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
    
    contains 
    
    subroutine GetMatrixFromFile(name, N, ia, ja, A)
        character(len=*), intent(in) :: name
        double precision, intent(out), allocatable :: A(:)
        integer, intent(out), allocatable :: ia(:), ja(:)
        integer, intent(out) :: N
        
        character(len=255) :: buffer
        integer :: nnz, status, iVal, prevIval, cnt
        
        ! open file
        open(UNIT=21, FILE=name, STATUS="OLD", IOSTAT=status)
        
        ! Get matrix size from first lines of file
        read(UNIT=21, FMT='(A)') buffer ! Ignore title
        read(UNIT=21, FMT='(A5, I)') buffer, N
        read(UNIT=21, FMT='(A7, I)') buffer, nnz
        read(UNIT=21, FMT='(A)') buffer ! Ignore column heading
        read(UNIT=21, FMT='(A)') buffer ! Ignore space
        ! Allocate matrix - assume no errors!
        allocate(ia(N + 1), ja(nnz), A(nnz))
                
        ! Loop thru file. Assume only end of file error will occur
        status = 0
        cnt = 0
        prevIval = 0
        do while (status == 0)
            cnt = cnt + 1
            read(UNIT=21, FMT='(X, I, X, I, X, E)', IOSTAT=status) ival, ja(cnt), A(cnt)
            
            if ( ival /= prevIval ) then
                ia(ival) = cnt
                prevIval = ival
            end if          
        end do
        
        ia(N + 1) = nnz + 1
                    
    end subroutine GetMatrixFromFile
    
    
    subroutine SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
        integer, intent(in) :: N, numberToSolve
        integer, intent(in) :: ia(:), ja(:)
        double precision, intent(in) :: A(:)
    
        double precision, intent(inout) :: toSolve(:,:)
        
        type(MKL_PARDISO_HANDLE) :: pt(64)  
        integer :: param(64), perm(N), error, i
        double precision:: tmpArray(N)
        
        ! Initialize Pardiso options
        CALL pardisoinit(pt, 2, param)
        param( 6) = 1 ! Solver stores the solution in the right-hand side b.
        param(27) = 1 ! Check input data
        
        ! call omp_set_num_threads(1) ! Uncommenting this line Works
        
        ! Solve
        call pardiso(pt, 1, 1, 2, 12, N, A, ia, ja, perm, 1, param, 1, toSolve(:, 1), tmpArray, error)
        
        !$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i)
        DO i = 1, numberToSolve
            WRITE(*,*) (omp_get_thread_num() + 1), " : at critical"
            !$OMP CRITICAL (criticalPardisoSection2109)
            WRITE(*,*) (omp_get_thread_num() + 1), " : solving"
            
            ! Solve
            call pardiso(pt, 1, 1, 2, 33, N, A, ia, ja, perm, 1, param, 1, toSolve(:, i), tmpArray, error)
            
            WRITE(*,*) (omp_get_thread_num() + 1), " : complete"
            !$OMP END CRITICAL (criticalPardisoSection2109)
        END DO
        !$OMP END PARALLEL DO
  
    end subroutine SolveWithPardiso
        
    end program TestPardiso2
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael, I couldn't reproduce the issue. my environment -- win8, 64 bit, threading linking, 4 threads. MKL version 11.3.0. the log file is attached.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael, sorry something wrong with attachment. please see below the some part of this log below.
all 20 iterations were completed successfully:
Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications                                                                                          
	           1  : at critical
	           1  : solving, num of iteration ==            1
	           1  : complete
	           1  : at critical
	=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
..................................
.................................
solving, num of iteration ==           20
	           1  : complete
	  ==== PASSED ==== 
	    number of non-zeros in U:                1
	             number of non-zeros in L+U:              1078366
	             gflop   for the numerical factorization: 0.189263
gflop/s for the numerical factorization: 9.107600
	=== PARDISO: solving a symmetric positive definite system ===
	Summary: ( solution phase )
	================
Times:
	======
	Time spent in direct solver at solve step (solve)                : 0.003051 s
	Time spent in additional calculations                            : 0.000647 s
	Total time spent                                                 : 0.003698 s
Statistics:
	===========
	Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
	             number of equations:           14190
	             number of non-zeros in A:      167808
	             number of non-zeros in A (%): 0.083339
number of right-hand sides: 1
< Factors L and U >
	             number of columns for each panel: 80
	             number of independent subgraphs:  0
	< Preprocessing with state of the art partitioning metis>
	             number of supernodes:                    3594
	             size of largest supernode:               370
	             number of non-zeros in L:                1078365
	             number of non-zeros in U:                1
	             number of non-zeros in L+U:              1078366
	             gflop   for the numerical factorization: 0.189263
gflop/s for the numerical factorization: 9.107600
	=== PARDISO: solving a symmetric positive definite system ===
	Summary: ( solution phase )
	================
Times:
	======
	Time spent in direct solver at solve step (solve)                : 0.003532 s
	Time spent in additional calculations                            : 0.000626 s
	Total time spent                                                 : 0.004158 s
Statistics:
	===========
	Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
	             number of equations:           14190
	             number of non-zeros in A:      167808
	             number of non-zeros in A (%): 0.083339
number of right-hand sides: 1
< Factors L and U >
	             number of columns for each panel: 80
	             number of independent subgraphs:  0
	< Preprocessing with state of the art partitioning metis>
	             number of supernodes:                    3594
	             size of largest supernode:               370
	             number of non-zeros in L:                1078365
	             number of non-zeros in U:                1
	             number of non-zeros in L+U:              1078366
	             gflop   for the numerical factorization: 0.189263
gflop/s for the numerical factorization: 9.107600
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gennady,
Thanks for taking the time to look into this. Can you confirm OMP directives are being honoured (compiling with /Qmkl:parallel)? I would expect to see output from other threads being blocked at the critical section; something like:
1 : at critical
2 : at critical
3 : at critical
In your output I only see thread 1?
Thanks again,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page