Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

PARDISO vs LAPACK Performance

Beginner
171 Views

Good afternoon,

I am trying to improve the speed of a 2D and 3D cylindrical coordinates fluid flow in porous media code. I have a Core i7, 3.4 GHz, 32 GB RAM, 4 Processors (which work as if they were 8) machine. Right now I am working on the Win32 platform, Visual Studio and Fortran90. I have to solve systems of equations at a lot of times one after the other one (it is a transient problem so I have to call the function a lot of times - do while loop); for the 2D case the maximum number of unknowns I would ever try to manage is around 3500 and for the 3D case this number goes up to 50000. I have compared LAPACK and PARDISO solvers in the 2D case, and in all of the simulations I have run, surprisingly LAPACK solver gives me better wall-clock time (which is what the actual users will care about, not the CPU..right?). Difference are as follow:

* 2050 unknowns: LAPACK = 2 sec, PARDISO = 3.5 sec

* 3570 unknowns: LAPACK = 5 sec, PARDISO = 9 sec

From the Windows Task Manager I can actually see that when I am implementing the PARDISO solver the PC is using minimum 6 threads out of 8. In some cases it is using the 8 threads, however the results are unfavorable to PARIDSO in this situation. Please find below a script of the code on which I am implementing the two solvers with the corresponding parameters. I would like to know if some of the parameters in PARDISO (iparm for example which has 64 entries) are not the right ones? (it is worth to mention that the results are exactly the same on both cases). What could be the reason(s) for having this behavior, on which one would expect PARDISO to be faster? Maybe is PARDISO more suitable for larger systems?

My main purpose is to make the code as fast as possible, therefore making use as much processors in my computer as possible is a goal.

Please find below a script of my code.

Thank you very much. I would appreciate the help.

Regards,

J.D.

```!SOLVING SYSTEM OF EQUATIONS - USING LAPACK - GBSV OR PARDISO
allocate(tmp2(Nz), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate temporal vector to get as output Pwf for all times at the middle of the packer ***"
tmp2 = abs(czp-dzgp)
idxx2 = minloc(tmp2)
idx2 = idxx2(1)

allocate(IPIV(Nz*Nr), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate pivoting vector as per LAPACK - GBSV subroutine, if N-N ***"
IPIV = 0
allocate(BTMnni(((2*Nz)+Nz+1),(Nz*Nr)), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate 'initial' banded TM Matrix as per LAPACK - GBSV subroutine requirements, if N-N ***"

allocate(RHSnn((Nz*Nr),1), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate right hand side vector of system of equations as per LAPACK - GBSV subroutine, if N-N ***"
BTMnni = BTMnn
RHSnn(:,1) = matmul(BMnn,Psfr(:,1)) - NNSSV
allocate(DeltaP(Nz*Nr), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate DeltaP vector to get DPMAX, if N-N ***"
DeltaP = 0.0_dp

!Pardiso
if (ParTLapF) then
do i = 1,64
pt( i )%DUMMY =  0
end do
maxfct = 1
mnum = 1
mtype = 2

allocate(nzele((size(Tz))+(size(Tr))+(Nz*Nr)), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate nzele((size(Tz))+(size(Tth))+(size(Tr))+(Nz*Nth*Nr)), in boundaries module ***"
allocate(colnzele((size(Tz))+(size(Tr))+(Nz*Nr)), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate colnzele((size(Tz))+(size(Tth))+(size(Tr))+(Nz*Nth*Nr)), in boundaries module ***"
allocate(innzele((Nz*Nr)+1), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate innzele((Nz*Nr*Nth)+1), in boundaries module ***"
cta = 0
ctt = 0
do i = 1,(Nz*Nr)
do j = 1,(Nz*Nr)
if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
ctt = ctt + 1
nzele(ctt) = ATMnn(i,j)
colnzele(ctt) = j
if (j == i) then
cta = cta + 1
innzele(cta) = ctt
end if
end if
end do
end do
innzele(size(innzele)) = ((size(Tz))+(size(Tr))+(Nz*Nr))+1

allocate(perm(Nz*Nr), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate perm(Nz*Nr), in boundaries module ***"
do i = 1,(Nz*Nr)
perm(i) = i
end do

nrhs = 1

do i = 1,64
iparm(i) = 0
end do

iparm(1) = 1 ! no solver default
iparm(2) = 2 ! fill-in reordering from METIS
iparm(4) = 62 ! no iterative-direct algorithm 0 or 62 (iterative preconditioning)
iparm(5) = 0 ! no user fill-in reducing permutation
iparm(6) = 1 ! =0 solution on the first n compoments of x
iparm(8) = 0 ! numbers of iterative refinement steps
iparm(10) = 8 ! perturbe the pivot elements with 1E-13
iparm(11) = 0 ! use nonsymmetric permutation and scaling MPS
iparm(13) = 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm(13) = 1 in case of inappropriate accuracy
iparm(14) = 0 ! Output: number of perturbed pivots
iparm(18) = 0 ! Output: number of nonzeros in the factor LU
iparm(19) = 0 ! Output: Mflops for LU factorization
iparm(20) = 0 ! Output: Numbers of CG Iterations

msglvl = 0  !not printing statistical information
allocate(XXX((Nz*Nr)), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate XXX(Nz*Nr), in boundaries module ***"

E = 0

end if
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

dtn = dt
ct = 1
sumdt = 0.0_dp
do while (sumdt < tp)

ct = ct + 1

!Pardiso
if (ParTLapF) then
!                        phase = 11
!                        call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                        IF (E /= 0) THEN
!                           WRITE(*,*) 'The following ERROR was detected when phase = 11 - 1: ', E
!                        END IF
!                        phase = 22
!                        call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                        IF (E /= 0) THEN
!                           WRITE(*,*) 'The following ERROR was detected when phase = 22 - 1: ', E
!                        END IF
phase = 13 !(if above is commented-out I have to put 33 here)
call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, RHSnn, XXX, E)
IF (E /= 0) THEN
WRITE(*,*) 'The following ERROR was detected when phase = 33 - 1: ', E
END IF
phase = -1 ! release internal memory
call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, ddum, perm, perm, perm, nrhs, iparm, msglvl, ddum, ddum, E)
else
call GBSV(BTMnni, RHSnn, Nz, IPIV, INFO)    !LAPACK SOLVER FOR BANDED MATRIX SYSTEM OF EQUATIONS
end if
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

deallocate(Psfr, STAT = DeAllocateStatus)
allocate(Psfr((Nz*Nr),ct), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***"
Psfr(:,ct) = RHSnn(:,1)

do i = 1,(ct-1)
Psfr(:,i) = Psfrtemp(:,i)
end do
deallocate(Psfrtemp, STAT = DeAllocateStatus)
allocate(Psfrtemp((Nz*Nr),ct), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***"
Psfrtemp = Psfr

do i = 1,(Nz*Nr)
DeltaP(i) = abs(Psfr(i,ct) - Psfr(i,(ct-1)))
end do
DPMAX = maxval(DeltaP)

if (DPMAX <= MULT*DPLIM) then
sumdt = sumdt + dtn
dtn = dtn*(DPLIM/DPMAX)
if ((sumdt+dtn > tp) .AND. (sumdt /= tp)) then
dtn = tp - sumdt
end if

BMnn = 0.0_dp                                                           !initializing Cij*Vij Matrix to zeros
do h = 1,(Nz*Nr)
BMnn(h,h) = ((1.0_dp)/(dtn))*B(h)                                   !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered
end do
if (ws == 0) then
BMnn = BMnn                                                         !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0)
else if (ws == 1) then
do i = 1,Nz
if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then
WSv(i) = 0.0_dp
else
if (i == 1) then
WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp)
else if (i == Nz) then
WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp)
else
WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp)
end if
end if
end do
do i = 1,Nz
BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn)                           !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix
end do
end if
ATMnn = TMnn + BMnn

!Pardiso
if (ParTLapF == .false.) then
BTMnn = 0.0_dp
do j = 1,(Nz*Nr)
do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz))
BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j)
end do
end do
BTMnni = BTMnn
end if

RHSnn(:,1) = matmul(BMnn,Psfr(:,ct)) - NNSSV

if (ParTLapF) then
cta = 0
ctt = 0
do i = 1,(Nz*Nr)
do j = 1,(Nz*Nr)
if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
ctt = ctt + 1
nzele(ctt) = ATMnn(i,j)
end if
end do
end do
end if
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
else
do while (DPMAX > MULT*DPLIM)
dtn = dtn*(DPLIM/DPMAX)

BMnn = 0.0_dp                                                           !initializing Cij*Vij Matrix to zeros
do h = 1,(Nz*Nr)
BMnn(h,h) = ((1.0_dp)/(dtn))*B(h)                                   !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered
end do
if (ws == 0) then
BMnn = BMnn                                                         !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0)
else if (ws == 1) then
do i = 1,Nz
if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then
WSv(i) = 0.0_dp
else
if (i == 1) then
WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp)
else if (i == Nz) then
WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp)
else
WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp)
end if
end if
end do
do i = 1,Nz
BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn)                           !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix
end do
end if
ATMnn = TMnn + BMnn

!Pardiso
if (ParTLapF == .false.) then
BTMnn = 0.0_dp
do j = 1,(Nz*Nr)
do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz))
BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j)
end do
end do
BTMnni = BTMnn
end if

RHSnn(:,1) = matmul(BMnn,Psfr(:,ct-1)) - NNSSV

if (ParTLapF) then
cta = 0
ctt = 0
do i = 1,(Nz*Nr)
do j = 1,(Nz*Nr)
if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
ctt = ctt + 1
nzele(ctt) = ATMnn(i,j)
end if
end do
end do

!                                phase = 11
!                                call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                                IF (E /= 0) THEN
!                                   WRITE(*,*) 'The following ERROR was detected when phase = 11 - 2: ', E
!                                END IF
!                                phase = 22
!                                call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                                IF (E /= 0) THEN
!                                   WRITE(*,*) 'The following ERROR was detected when phase = 22 - 2: ', E
!                                END IF
phase = 13  !(if above is commented-out I have to put 33 here)
call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, RHSnn, XXX, E)
IF (E /= 0) THEN
WRITE(*,*) 'The following ERROR was detected when phase = 33 - 2: ', E
END IF
phase = -1 ! release internal memory
call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, ddum, perm, perm, perm, nrhs, iparm, msglvl, ddum, ddum, E)
else
call GBSV(BTMnni, RHSnn, Nz, IPIV, INFO)
end if
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

deallocate(Psfr, STAT = DeAllocateStatus)
allocate(Psfr((Nz*Nr),ct), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***"
Psfr(:,ct) = RHSnn(:,1)

do i = 1,(ct-1)
Psfr(:,i) = Psfrtemp(:,i)
end do
deallocate(Psfrtemp, STAT = DeAllocateStatus)
allocate(Psfrtemp((Nz*Nr),ct), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***"
Psfrtemp = Psfr

do i = 1,(Nz*Nr)
DeltaP(i) = abs(Psfr(i,ct) - Psfr(i,(ct-1)))        !SHOULD BE ct-1!!!!
end do
DPMAX = maxval(DeltaP)
end do
sumdt = sumdt + dtn
dtn = dtn*(DPLIM/DPMAX)

if ((sumdt+dtn > tp) .AND. (sumdt /= tp)) then
dtn = tp - sumdt
end if

BMnn = 0.0_dp                                                           !initializing Cij*Vij Matrix to zeros
do h = 1,(Nz*Nr)
BMnn(h,h) = ((1.0_dp)/(dtn))*B(h)                                   !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered
end do
if (ws == 0) then
BMnn = BMnn                                                         !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0)
else if (ws == 1) then
do i = 1,Nz
if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then
WSv(i) = 0.0_dp
else
if (i == 1) then
WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp)
else if (i == Nz) then
WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp)
else
WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp)
end if
end if
end do
do i = 1,Nz
BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn)                           !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix
end do
end if
ATMnn = TMnn + BMnn

!Pardiso
if (ParTLapF == .false.) then
BTMnn = 0.0_dp
do j = 1,(Nz*Nr)
do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz))
BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j)
end do
end do
BTMnni = BTMnn
end if

RHSnn(:,1) = matmul(BMnn,Psfr(:,ct)) - NNSSV

if (ParTLapF) then
cta = 0
ctt = 0
do i = 1,(Nz*Nr)
do j = 1,(Nz*Nr)
if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
ctt = ctt + 1
nzele(ctt) = ATMnn(i,j)
end if
end do
end do
end if
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

end if
deallocate(tv, STAT = DeAllocateStatus)
allocate(tv(ct), STAT = AllocateStatus)
if (AllocateStatus /=0 ) STOP "*** Not enough memory to allocate single flow rate time vector during solver while loop, if N-N ***"
tv(ct) = sumdt
do i = 1,(ct-1)
tv(i) = tvtemp(i)
end do
deallocate(tvtemp, STAT = DeAllocateStatus)
allocate(tvtemp(ct), STAT = AllocateStatus)
if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow rate temporal time vector during solver while loop, if N-N ***"
tvtemp = tv

print *, sumdt, 'sec of ', tp, 'sec'
print *, Psfr(idx2,ct)*Pa2psi, 'psi', ' at center of packer'

end do
tv(1) = 0.0_dp```