Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

PARDISO vs LAPACK Performance

Juan_Diego_E_
Beginner
365 Views

Good afternoon,

I am trying to improve the speed of a 2D and 3D cylindrical coordinates fluid flow in porous media code. I have a Core i7, 3.4 GHz, 32 GB RAM, 4 Processors (which work as if they were 8) machine. Right now I am working on the Win32 platform, Visual Studio and Fortran90. I have to solve systems of equations at a lot of times one after the other one (it is a transient problem so I have to call the function a lot of times - do while loop); for the 2D case the maximum number of unknowns I would ever try to manage is around 3500 and for the 3D case this number goes up to 50000. I have compared LAPACK and PARDISO solvers in the 2D case, and in all of the simulations I have run, surprisingly LAPACK solver gives me better wall-clock time (which is what the actual users will care about, not the CPU..right?). Difference are as follow:

* 2050 unknowns: LAPACK = 2 sec, PARDISO = 3.5 sec

* 3570 unknowns: LAPACK = 5 sec, PARDISO = 9 sec

From the Windows Task Manager I can actually see that when I am implementing the PARDISO solver the PC is using minimum 6 threads out of 8. In some cases it is using the 8 threads, however the results are unfavorable to PARIDSO in this situation. Please find below a script of the code on which I am implementing the two solvers with the corresponding parameters. I would like to know if some of the parameters in PARDISO (iparm for example which has 64 entries) are not the right ones? (it is worth to mention that the results are exactly the same on both cases). What could be the reason(s) for having this behavior, on which one would expect PARDISO to be faster? Maybe is PARDISO more suitable for larger systems?

My main purpose is to make the code as fast as possible, therefore making use as much processors in my computer as possible is a goal.

Please find below a script of my code.

Thank you very much. I would appreciate the help.

Regards,

J.D.

!SOLVING SYSTEM OF EQUATIONS - USING LAPACK - GBSV OR PARDISO
                allocate(tmp2(Nz), STAT = AllocateStatus)
                if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate temporal vector to get as output Pwf for all times at the middle of the packer ***"
                tmp2 = abs(czp-dzgp)
                idxx2 = minloc(tmp2)
                idx2 = idxx2(1)
                
                allocate(IPIV(Nz*Nr), STAT = AllocateStatus)
                if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate pivoting vector as per LAPACK - GBSV subroutine, if N-N ***"
                IPIV = 0
                allocate(BTMnni(((2*Nz)+Nz+1),(Nz*Nr)), STAT = AllocateStatus)
                if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate 'initial' banded TM Matrix as per LAPACK - GBSV subroutine requirements, if N-N ***"
                
                allocate(RHSnn((Nz*Nr),1), STAT = AllocateStatus)
                if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate right hand side vector of system of equations as per LAPACK - GBSV subroutine, if N-N ***"
                BTMnni = BTMnn
                RHSnn(:,1) = matmul(BMnn,Psfr(:,1)) - NNSSV
                allocate(DeltaP(Nz*Nr), STAT = AllocateStatus)
                if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate DeltaP vector to get DPMAX, if N-N ***"
                DeltaP = 0.0_dp

                !Pardiso
                if (ParTLapF) then
                    do i = 1,64
                       pt( i )%DUMMY =  0 
                    end do
                    maxfct = 1
                    mnum = 1
                    mtype = 2
                    
                    allocate(nzele((size(Tz))+(size(Tr))+(Nz*Nr)), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate nzele((size(Tz))+(size(Tth))+(size(Tr))+(Nz*Nth*Nr)), in boundaries module ***"
                    allocate(colnzele((size(Tz))+(size(Tr))+(Nz*Nr)), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate colnzele((size(Tz))+(size(Tth))+(size(Tr))+(Nz*Nth*Nr)), in boundaries module ***"
                    allocate(innzele((Nz*Nr)+1), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate innzele((Nz*Nr*Nth)+1), in boundaries module ***"
                    cta = 0
                    ctt = 0
                    do i = 1,(Nz*Nr)
                        do j = 1,(Nz*Nr)
                            if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
                                ctt = ctt + 1
                                nzele(ctt) = ATMnn(i,j)
                                colnzele(ctt) = j
                                if (j == i) then
                                    cta = cta + 1
                                    innzele(cta) = ctt
                                end if
                            end if
                        end do
                    end do
                    innzele(size(innzele)) = ((size(Tz))+(size(Tr))+(Nz*Nr))+1
                                       
                    allocate(perm(Nz*Nr), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate perm(Nz*Nr), in boundaries module ***"
                    do i = 1,(Nz*Nr)
                        perm(i) = i 
                    end do
                    
                    nrhs = 1
                    
                    do i = 1,64
                       iparm(i) = 0
                    end do 

                    iparm(1) = 1 ! no solver default
                    iparm(2) = 2 ! fill-in reordering from METIS
                    iparm(4) = 62 ! no iterative-direct algorithm 0 or 62 (iterative preconditioning)
                    iparm(5) = 0 ! no user fill-in reducing permutation
                    iparm(6) = 1 ! =0 solution on the first n compoments of x
                    iparm(8) = 0 ! numbers of iterative refinement steps
                    iparm(10) = 8 ! perturbe the pivot elements with 1E-13
                    iparm(11) = 0 ! use nonsymmetric permutation and scaling MPS
                    iparm(13) = 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm(13) = 1 in case of inappropriate accuracy
                    iparm(14) = 0 ! Output: number of perturbed pivots
                    iparm(18) = 0 ! Output: number of nonzeros in the factor LU
                    iparm(19) = 0 ! Output: Mflops for LU factorization
                    iparm(20) = 0 ! Output: Numbers of CG Iterations
                                   
                    msglvl = 0  !not printing statistical information
                    allocate(XXX((Nz*Nr)), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate XXX(Nz*Nr), in boundaries module ***"
                    
                    E = 0
                    
                end if
                !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                
                dtn = dt
                ct = 1
                sumdt = 0.0_dp
                do while (sumdt < tp)
                                       
                    ct = ct + 1
                       
                    !Pardiso
                    if (ParTLapF) then
!                        phase = 11 
!                        call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                        IF (E /= 0) THEN
!                           WRITE(*,*) 'The following ERROR was detected when phase = 11 - 1: ', E
!                        END IF
!                        phase = 22
!                        call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                        IF (E /= 0) THEN
!                           WRITE(*,*) 'The following ERROR was detected when phase = 22 - 1: ', E
!                        END IF
                        phase = 13 !(if above is commented-out I have to put 33 here)
                        call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, RHSnn, XXX, E)
                        IF (E /= 0) THEN
                           WRITE(*,*) 'The following ERROR was detected when phase = 33 - 1: ', E
                        END IF
                        phase = -1 ! release internal memory
                        call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, ddum, perm, perm, perm, nrhs, iparm, msglvl, ddum, ddum, E)
                    else
                        call GBSV(BTMnni, RHSnn, Nz, IPIV, INFO)    !LAPACK SOLVER FOR BANDED MATRIX SYSTEM OF EQUATIONS
                    end if
                    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                    
                    deallocate(Psfr, STAT = DeAllocateStatus)
                    allocate(Psfr((Nz*Nr),ct), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***"
                    Psfr(:,ct) = RHSnn(:,1)
                    
                    do i = 1,(ct-1)
                        Psfr(:,i) = Psfrtemp(:,i)
                    end do
                    deallocate(Psfrtemp, STAT = DeAllocateStatus)
                    allocate(Psfrtemp((Nz*Nr),ct), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***" 
                    Psfrtemp = Psfr
                    
                    do i = 1,(Nz*Nr)
                        DeltaP(i) = abs(Psfr(i,ct) - Psfr(i,(ct-1)))
                    end do
                    DPMAX = maxval(DeltaP)
                    
                    if (DPMAX <= MULT*DPLIM) then
                        sumdt = sumdt + dtn
                        dtn = dtn*(DPLIM/DPMAX)
                        if ((sumdt+dtn > tp) .AND. (sumdt /= tp)) then
                            dtn = tp - sumdt
                        end if
                        
                        BMnn = 0.0_dp                                                           !initializing Cij*Vij Matrix to zeros
                        do h = 1,(Nz*Nr)
                            BMnn(h,h) = ((1.0_dp)/(dtn))*B(h)                                   !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered
                        end do                                                                   
                        if (ws == 0) then
                            BMnn = BMnn                                                         !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0)
                        else if (ws == 1) then
                            do i = 1,Nz
                                if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then
                                    WSv(i) = 0.0_dp
                                else
                                    if (i == 1) then
                                        WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp)
                                    else if (i == Nz) then
                                        WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp)
                                    else
                                        WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp)
                                    end if
                                end if
                            end do
                            do i = 1,Nz                          
                                BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn)                           !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix
                            end do
                        end if
                        ATMnn = TMnn + BMnn
                        
                        !Pardiso
                        if (ParTLapF == .false.) then
                            BTMnn = 0.0_dp
                            do j = 1,(Nz*Nr)
                                do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz))
                                    BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j)     
                                end do
                            end do                 
                            BTMnni = BTMnn
                        end if
                        
                        RHSnn(:,1) = matmul(BMnn,Psfr(:,ct)) - NNSSV
                        
                        if (ParTLapF) then
                            cta = 0
                            ctt = 0
                            do i = 1,(Nz*Nr)
                                do j = 1,(Nz*Nr)
                                    if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
                                        ctt = ctt + 1
                                        nzele(ctt) = ATMnn(i,j)
                                    end if
                                end do
                            end do
                        end if
                        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                    else
                        do while (DPMAX > MULT*DPLIM)
                            dtn = dtn*(DPLIM/DPMAX)
                            
                            BMnn = 0.0_dp                                                           !initializing Cij*Vij Matrix to zeros
                            do h = 1,(Nz*Nr)
                                BMnn(h,h) = ((1.0_dp)/(dtn))*B(h)                                   !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered
                            end do                                                                   
                            if (ws == 0) then
                                BMnn = BMnn                                                         !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0)
                            else if (ws == 1) then
                                do i = 1,Nz
                                    if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then
                                        WSv(i) = 0.0_dp
                                    else
                                        if (i == 1) then
                                            WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp)
                                        else if (i == Nz) then
                                            WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp)
                                        else
                                            WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp)
                                        end if
                                    end if
                                end do
                                do i = 1,Nz                          
                                    BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn)                           !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix
                                end do
                            end if
                            ATMnn = TMnn + BMnn
                            
                            !Pardiso
                            if (ParTLapF == .false.) then 
                                BTMnn = 0.0_dp
                                do j = 1,(Nz*Nr)
                                    do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz))
                                        BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j)     
                                    end do
                                end do                 
                                BTMnni = BTMnn
                            end if
                            
                            RHSnn(:,1) = matmul(BMnn,Psfr(:,ct-1)) - NNSSV
                            
                            if (ParTLapF) then
                                cta = 0
                                ctt = 0
                                do i = 1,(Nz*Nr)
                                    do j = 1,(Nz*Nr)
                                        if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
                                            ctt = ctt + 1
                                            nzele(ctt) = ATMnn(i,j)
                                        end if
                                    end do
                                end do
                                
!                                phase = 11 
!                                call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                                IF (E /= 0) THEN
!                                   WRITE(*,*) 'The following ERROR was detected when phase = 11 - 2: ', E
!                                END IF
!                                phase = 22
!                                call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E)
!                                IF (E /= 0) THEN
!                                   WRITE(*,*) 'The following ERROR was detected when phase = 22 - 2: ', E
!                                END IF
                                phase = 13  !(if above is commented-out I have to put 33 here)
                                call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, RHSnn, XXX, E)
                                IF (E /= 0) THEN
                                   WRITE(*,*) 'The following ERROR was detected when phase = 33 - 2: ', E
                                END IF
                                phase = -1 ! release internal memory
                                call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, ddum, perm, perm, perm, nrhs, iparm, msglvl, ddum, ddum, E)
                            else
                                call GBSV(BTMnni, RHSnn, Nz, IPIV, INFO)    
                            end if
                            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                            
                            deallocate(Psfr, STAT = DeAllocateStatus)
                            allocate(Psfr((Nz*Nr),ct), STAT = AllocateStatus)
                            if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***"
                            Psfr(:,ct) = RHSnn(:,1)

                            do i = 1,(ct-1)
                                Psfr(:,i) = Psfrtemp(:,i)
                            end do
                            deallocate(Psfrtemp, STAT = DeAllocateStatus)
                            allocate(Psfrtemp((Nz*Nr),ct), STAT = AllocateStatus)
                            if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***" 
                            Psfrtemp = Psfr
                            
                            do i = 1,(Nz*Nr)
                                DeltaP(i) = abs(Psfr(i,ct) - Psfr(i,(ct-1)))        !SHOULD BE ct-1!!!!
                            end do
                            DPMAX = maxval(DeltaP)
                        end do
                        sumdt = sumdt + dtn
                        dtn = dtn*(DPLIM/DPMAX)
                        
                        if ((sumdt+dtn > tp) .AND. (sumdt /= tp)) then
                            dtn = tp - sumdt
                        end if
                        
                        BMnn = 0.0_dp                                                           !initializing Cij*Vij Matrix to zeros
                        do h = 1,(Nz*Nr)
                            BMnn(h,h) = ((1.0_dp)/(dtn))*B(h)                                   !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered
                        end do                                                                   
                        if (ws == 0) then
                            BMnn = BMnn                                                         !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0)
                        else if (ws == 1) then
                            do i = 1,Nz
                                if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then
                                    WSv(i) = 0.0_dp
                                else
                                    if (i == 1) then
                                        WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp)
                                    else if (i == Nz) then
                                        WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp)
                                    else
                                        WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp)
                                    end if
                                end if
                            end do
                            do i = 1,Nz                          
                                BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn)                           !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix
                            end do
                        end if
                        ATMnn = TMnn + BMnn
                        
                        !Pardiso
                        if (ParTLapF == .false.) then
                            BTMnn = 0.0_dp
                            do j = 1,(Nz*Nr)
                                do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz))
                                    BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j)     
                                end do
                            end do                 
                            BTMnni = BTMnn
                        end if
                        
                        RHSnn(:,1) = matmul(BMnn,Psfr(:,ct)) - NNSSV
                        
                        if (ParTLapF) then
                            cta = 0
                            ctt = 0
                            do i = 1,(Nz*Nr)
                                do j = 1,(Nz*Nr)
                                    if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then
                                        ctt = ctt + 1
                                        nzele(ctt) = ATMnn(i,j)
                                    end if
                                end do
                            end do
                        end if
                        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                        
                    end if
                    deallocate(tv, STAT = DeAllocateStatus)
                    allocate(tv(ct), STAT = AllocateStatus)
                    if (AllocateStatus /=0 ) STOP "*** Not enough memory to allocate single flow rate time vector during solver while loop, if N-N ***"
                    tv(ct) = sumdt
                    do i = 1,(ct-1)
                        tv(i) = tvtemp(i)
                    end do
                    deallocate(tvtemp, STAT = DeAllocateStatus)
                    allocate(tvtemp(ct), STAT = AllocateStatus)
                    if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow rate temporal time vector during solver while loop, if N-N ***"
                    tvtemp = tv
                                       
                    print *, sumdt, 'sec of ', tp, 'sec' 
                    print *, Psfr(idx2,ct)*Pa2psi, 'psi', ' at center of packer'

                end do
                tv(1) = 0.0_dp

 

0 Kudos
1 Reply
Frank_M
Beginner
365 Views

Running 8 threads may actually slow down your code. Hyper-threading works best on dissimilar types of operations. The best way to see if you are getting the max out of your processor is to to run the Inspector and the Amplifier.
 

0 Kudos
Reply