- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I suspect this is a problem with my code, but I can't figure out what the issue is...
I've been using similar code for a while now, and its worked fine until I installed 16.0, however it now hangs in the call to Pardiso with phase 33. Basically I import a matrix, factorize then solve in an OMP parallel loop for many RHS. Because of the layout of my main program (I provide a much stripped down version here, as well as a data file), the same temporary array is used by each thread, and the solution vector overwrites the right hand side. I'm unsure if Pardiso actually uses this temporary array - but in case it does I place the call to Pardiso within an OMP critical section to prevent multiple threads accessing it at the same time.
I compile with parallel OMP (/Qopenmp) and parallel MKL (/Qmkl:parallel) flags, otherwise everything is as Visual Studio sets it by default.
If OMP is set to one thread it runs fine. If I comment out the OMP parallel directive on the loop it also seems to run fine. However on my 8 (4 with hyper threading) core machine when running in parallel no joy.
If anyone has any suggestions/thoughts/solutions it would be appreciated,
Thanks,
Michael
include 'mkl_pardiso.f90'
program TestPardiso2
use OMP_LIB
use MKL_PARDISO
implicit none
double precision, allocatable :: A(:), toSolve(:,:)
integer, allocatable :: ia(:), ja(:)
integer :: N
integer, parameter :: numberToSolve = 20
! Read matrix from file
call GetMatrixFromFile("Output.txt", N, ia, ja, A)
! Invent some data to solve
allocate(toSolve(N,numberToSolve))
call RANDOM_NUMBER(toSolve)
! Solve using Pardiso
call SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
contains
subroutine GetMatrixFromFile(name, N, ia, ja, A)
character(len=*), intent(in) :: name
double precision, intent(out), allocatable :: A(:)
integer, intent(out), allocatable :: ia(:), ja(:)
integer, intent(out) :: N
character(len=255) :: buffer
integer :: nnz, status, iVal, prevIval, cnt
! open file
open(UNIT=21, FILE=name, STATUS="OLD", IOSTAT=status)
! Get matrix size from first lines of file
read(UNIT=21, FMT='(A)') buffer ! Ignore title
read(UNIT=21, FMT='(A5, I)') buffer, N
read(UNIT=21, FMT='(A7, I)') buffer, nnz
read(UNIT=21, FMT='(A)') buffer ! Ignore column heading
read(UNIT=21, FMT='(A)') buffer ! Ignore space
! Allocate matrix - assume no errors!
allocate(ia(N + 1), ja(nnz), A(nnz))
! Loop thru file. Assume only end of file error will occur
status = 0
cnt = 0
prevIval = 0
do while (status == 0)
cnt = cnt + 1
read(UNIT=21, FMT='(X, I, X, I, X, E)', IOSTAT=status) ival, ja(cnt), A(cnt)
if ( ival /= prevIval ) then
ia(ival) = cnt
prevIval = ival
end if
end do
ia(N + 1) = nnz + 1
end subroutine GetMatrixFromFile
subroutine SolveWithPardiso(N, ia, ja, A, numberToSolve, toSolve)
integer, intent(in) :: N, numberToSolve
integer, intent(in) :: ia(:), ja(:)
double precision, intent(in) :: A(:)
double precision, intent(inout) :: toSolve(:,:)
type(MKL_PARDISO_HANDLE) :: pt(64)
integer :: param(64), perm(N), error, i
double precision:: tmpArray(N)
! Initialize Pardiso options
CALL pardisoinit(pt, 2, param)
param( 6) = 1 ! Solver stores the solution in the right-hand side b.
param(27) = 1 ! Check input data
! call omp_set_num_threads(1) ! Uncommenting this line Works
! Solve
call pardiso(pt, 1, 1, 2, 12, N, A, ia, ja, perm, 1, param, 1, toSolve(:, 1), tmpArray, error)
!$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i)
DO i = 1, numberToSolve
WRITE(*,*) (omp_get_thread_num() + 1), " : at critical"
!$OMP CRITICAL (criticalPardisoSection2109)
WRITE(*,*) (omp_get_thread_num() + 1), " : solving"
! Solve
call pardiso(pt, 1, 1, 2, 33, N, A, ia, ja, perm, 1, param, 1, toSolve(:, i), tmpArray, error)
WRITE(*,*) (omp_get_thread_num() + 1), " : complete"
!$OMP END CRITICAL (criticalPardisoSection2109)
END DO
!$OMP END PARALLEL DO
end subroutine SolveWithPardiso
end program TestPardiso2
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael, I couldn't reproduce the issue. my environment -- win8, 64 bit, threading linking, 4 threads. MKL version 11.3.0. the log file is attached.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael, sorry something wrong with attachment. please see below the some part of this log below.
all 20 iterations were completed successfully:
Intel(R) Math Kernel Library Version 11.3.0 Product Build 20150730 for Intel(R) 64 architecture applications
1 : at critical
1 : solving, num of iteration == 1
1 : complete
1 : at critical
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
..................................
.................................
solving, num of iteration == 20
1 : complete
==== PASSED ====
number of non-zeros in U: 1
number of non-zeros in L+U: 1078366
gflop for the numerical factorization: 0.189263
gflop/s for the numerical factorization: 9.107600
=== PARDISO: solving a symmetric positive definite system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.003051 s
Time spent in additional calculations : 0.000647 s
Total time spent : 0.003698 s
Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
number of equations: 14190
number of non-zeros in A: 167808
number of non-zeros in A (%): 0.083339
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 3594
size of largest supernode: 370
number of non-zeros in L: 1078365
number of non-zeros in U: 1
number of non-zeros in L+U: 1078366
gflop for the numerical factorization: 0.189263
gflop/s for the numerical factorization: 9.107600
=== PARDISO: solving a symmetric positive definite system ===
Summary: ( solution phase )
================
Times:
======
Time spent in direct solver at solve step (solve) : 0.003532 s
Time spent in additional calculations : 0.000626 s
Total time spent : 0.004158 s
Statistics:
===========
Parallel Direct Factorization is running on 2 OpenMP
< Linear system Ax = b >
number of equations: 14190
number of non-zeros in A: 167808
number of non-zeros in A (%): 0.083339
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 80
number of independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
number of supernodes: 3594
size of largest supernode: 370
number of non-zeros in L: 1078365
number of non-zeros in U: 1
number of non-zeros in L+U: 1078366
gflop for the numerical factorization: 0.189263
gflop/s for the numerical factorization: 9.107600
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gennady,
Thanks for taking the time to look into this. Can you confirm OMP directives are being honoured (compiling with /Qmkl:parallel)? I would expect to see output from other threads being blocked at the critical section; something like:
1 : at critical
2 : at critical
3 : at critical
In your output I only see thread 1?
Thanks again,
Michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page