Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- PARDISO vs LAPACK Performance

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Juan_Diego_E_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-07-2015
11:33 AM

74 Views

PARDISO vs LAPACK Performance

Good afternoon,

I am trying to improve the speed of a 2D and 3D cylindrical coordinates fluid flow in porous media code. I have a Core i7, 3.4 GHz, 32 GB RAM, 4 Processors (which work as if they were 8) machine. Right now I am working on the Win32 platform, Visual Studio and Fortran90. I have to solve systems of equations at a lot of times one after the other one (it is a transient problem so I have to call the function a lot of times - do while loop); for the 2D case the maximum number of unknowns I would ever try to manage is around 3500 and for the 3D case this number goes up to 50000. I have compared LAPACK and PARDISO solvers in the 2D case, and in all of the simulations I have run, surprisingly LAPACK solver gives me better wall-clock time (which is what the actual users will care about, not the CPU..right?). Difference are as follow:

* 2050 unknowns: LAPACK = 2 sec, PARDISO = 3.5 sec

* 3570 unknowns: LAPACK = 5 sec, PARDISO = 9 sec

From the Windows Task Manager I can actually see that when I am implementing the PARDISO solver the PC is using minimum 6 threads out of 8. In some cases it is using the 8 threads, however the results are unfavorable to PARIDSO in this situation. Please find below a script of the code on which I am implementing the two solvers with the corresponding parameters. I would like to know if some of the parameters in PARDISO (iparm for example which has 64 entries) are not the right ones? (it is worth to mention that the results are exactly the same on both cases). What could be the reason(s) for having this behavior, on which one would expect PARDISO to be faster? Maybe is PARDISO more suitable for larger systems?

My main purpose is to make the code as fast as possible, therefore making use as much processors in my computer as possible is a goal.

Please find below a script of my code.

Thank you very much. I would appreciate the help.

Regards,

J.D.

!SOLVING SYSTEM OF EQUATIONS - USING LAPACK - GBSV OR PARDISO allocate(tmp2(Nz), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate temporal vector to get as output Pwf for all times at the middle of the packer ***" tmp2 = abs(czp-dzgp) idxx2 = minloc(tmp2) idx2 = idxx2(1) allocate(IPIV(Nz*Nr), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate pivoting vector as per LAPACK - GBSV subroutine, if N-N ***" IPIV = 0 allocate(BTMnni(((2*Nz)+Nz+1),(Nz*Nr)), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate 'initial' banded TM Matrix as per LAPACK - GBSV subroutine requirements, if N-N ***" allocate(RHSnn((Nz*Nr),1), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single phase flow rate right hand side vector of system of equations as per LAPACK - GBSV subroutine, if N-N ***" BTMnni = BTMnn RHSnn(:,1) = matmul(BMnn,Psfr(:,1)) - NNSSV allocate(DeltaP(Nz*Nr), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate DeltaP vector to get DPMAX, if N-N ***" DeltaP = 0.0_dp !Pardiso if (ParTLapF) then do i = 1,64 pt( i )%DUMMY = 0 end do maxfct = 1 mnum = 1 mtype = 2 allocate(nzele((size(Tz))+(size(Tr))+(Nz*Nr)), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate nzele((size(Tz))+(size(Tth))+(size(Tr))+(Nz*Nth*Nr)), in boundaries module ***" allocate(colnzele((size(Tz))+(size(Tr))+(Nz*Nr)), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate colnzele((size(Tz))+(size(Tth))+(size(Tr))+(Nz*Nth*Nr)), in boundaries module ***" allocate(innzele((Nz*Nr)+1), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate innzele((Nz*Nr*Nth)+1), in boundaries module ***" cta = 0 ctt = 0 do i = 1,(Nz*Nr) do j = 1,(Nz*Nr) if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then ctt = ctt + 1 nzele(ctt) = ATMnn(i,j) colnzele(ctt) = j if (j == i) then cta = cta + 1 innzele(cta) = ctt end if end if end do end do innzele(size(innzele)) = ((size(Tz))+(size(Tr))+(Nz*Nr))+1 allocate(perm(Nz*Nr), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate perm(Nz*Nr), in boundaries module ***" do i = 1,(Nz*Nr) perm(i) = i end do nrhs = 1 do i = 1,64 iparm(i) = 0 end do iparm(1) = 1 ! no solver default iparm(2) = 2 ! fill-in reordering from METIS iparm(4) = 62 ! no iterative-direct algorithm 0 or 62 (iterative preconditioning) iparm(5) = 0 ! no user fill-in reducing permutation iparm(6) = 1 ! =0 solution on the first n compoments of x iparm(8) = 0 ! numbers of iterative refinement steps iparm(10) = 8 ! perturbe the pivot elements with 1E-13 iparm(11) = 0 ! use nonsymmetric permutation and scaling MPS iparm(13) = 0 ! maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm(13) = 1 in case of inappropriate accuracy iparm(14) = 0 ! Output: number of perturbed pivots iparm(18) = 0 ! Output: number of nonzeros in the factor LU iparm(19) = 0 ! Output: Mflops for LU factorization iparm(20) = 0 ! Output: Numbers of CG Iterations msglvl = 0 !not printing statistical information allocate(XXX((Nz*Nr)), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate XXX(Nz*Nr), in boundaries module ***" E = 0 end if !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! dtn = dt ct = 1 sumdt = 0.0_dp do while (sumdt < tp) ct = ct + 1 !Pardiso if (ParTLapF) then ! phase = 11 ! call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E) ! IF (E /= 0) THEN ! WRITE(*,*) 'The following ERROR was detected when phase = 11 - 1: ', E ! END IF ! phase = 22 ! call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E) ! IF (E /= 0) THEN ! WRITE(*,*) 'The following ERROR was detected when phase = 22 - 1: ', E ! END IF phase = 13 !(if above is commented-out I have to put 33 here) call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, RHSnn, XXX, E) IF (E /= 0) THEN WRITE(*,*) 'The following ERROR was detected when phase = 33 - 1: ', E END IF phase = -1 ! release internal memory call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, ddum, perm, perm, perm, nrhs, iparm, msglvl, ddum, ddum, E) else call GBSV(BTMnni, RHSnn, Nz, IPIV, INFO) !LAPACK SOLVER FOR BANDED MATRIX SYSTEM OF EQUATIONS end if !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! deallocate(Psfr, STAT = DeAllocateStatus) allocate(Psfr((Nz*Nr),ct), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***" Psfr(:,ct) = RHSnn(:,1) do i = 1,(ct-1) Psfr(:,i) = Psfrtemp(:,i) end do deallocate(Psfrtemp, STAT = DeAllocateStatus) allocate(Psfrtemp((Nz*Nr),ct), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***" Psfrtemp = Psfr do i = 1,(Nz*Nr) DeltaP(i) = abs(Psfr(i,ct) - Psfr(i,(ct-1))) end do DPMAX = maxval(DeltaP) if (DPMAX <= MULT*DPLIM) then sumdt = sumdt + dtn dtn = dtn*(DPLIM/DPMAX) if ((sumdt+dtn > tp) .AND. (sumdt /= tp)) then dtn = tp - sumdt end if BMnn = 0.0_dp !initializing Cij*Vij Matrix to zeros do h = 1,(Nz*Nr) BMnn(h,h) = ((1.0_dp)/(dtn))*B(h) !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered end do if (ws == 0) then BMnn = BMnn !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0) else if (ws == 1) then do i = 1,Nz if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then WSv(i) = 0.0_dp else if (i == 1) then WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp) else if (i == Nz) then WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp) else WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp) end if end if end do do i = 1,Nz BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn) !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix end do end if ATMnn = TMnn + BMnn !Pardiso if (ParTLapF == .false.) then BTMnn = 0.0_dp do j = 1,(Nz*Nr) do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz)) BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j) end do end do BTMnni = BTMnn end if RHSnn(:,1) = matmul(BMnn,Psfr(:,ct)) - NNSSV if (ParTLapF) then cta = 0 ctt = 0 do i = 1,(Nz*Nr) do j = 1,(Nz*Nr) if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then ctt = ctt + 1 nzele(ctt) = ATMnn(i,j) end if end do end do end if !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! else do while (DPMAX > MULT*DPLIM) dtn = dtn*(DPLIM/DPMAX) BMnn = 0.0_dp !initializing Cij*Vij Matrix to zeros do h = 1,(Nz*Nr) BMnn(h,h) = ((1.0_dp)/(dtn))*B(h) !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered end do if (ws == 0) then BMnn = BMnn !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0) else if (ws == 1) then do i = 1,Nz if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then WSv(i) = 0.0_dp else if (i == 1) then WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp) else if (i == Nz) then WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp) else WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp) end if end if end do do i = 1,Nz BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn) !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix end do end if ATMnn = TMnn + BMnn !Pardiso if (ParTLapF == .false.) then BTMnn = 0.0_dp do j = 1,(Nz*Nr) do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz)) BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j) end do end do BTMnni = BTMnn end if RHSnn(:,1) = matmul(BMnn,Psfr(:,ct-1)) - NNSSV if (ParTLapF) then cta = 0 ctt = 0 do i = 1,(Nz*Nr) do j = 1,(Nz*Nr) if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then ctt = ctt + 1 nzele(ctt) = ATMnn(i,j) end if end do end do ! phase = 11 ! call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E) ! IF (E /= 0) THEN ! WRITE(*,*) 'The following ERROR was detected when phase = 11 - 2: ', E ! END IF ! phase = 22 ! call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, ddum, ddum, E) ! IF (E /= 0) THEN ! WRITE(*,*) 'The following ERROR was detected when phase = 22 - 2: ', E ! END IF phase = 13 !(if above is commented-out I have to put 33 here) call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, nzele, innzele, colnzele, perm, nrhs, iparm, msglvl, RHSnn, XXX, E) IF (E /= 0) THEN WRITE(*,*) 'The following ERROR was detected when phase = 33 - 2: ', E END IF phase = -1 ! release internal memory call pardiso_D (pt, maxfct, mnum, mtype, phase, Nz*Nr, ddum, perm, perm, perm, nrhs, iparm, msglvl, ddum, ddum, E) else call GBSV(BTMnni, RHSnn, Nz, IPIV, INFO) end if !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! deallocate(Psfr, STAT = DeAllocateStatus) allocate(Psfr((Nz*Nr),ct), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***" Psfr(:,ct) = RHSnn(:,1) do i = 1,(ct-1) Psfr(:,i) = Psfrtemp(:,i) end do deallocate(Psfrtemp, STAT = DeAllocateStatus) allocate(Psfrtemp((Nz*Nr),ct), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow pressure vector during solver while loop, if N-N ***" Psfrtemp = Psfr do i = 1,(Nz*Nr) DeltaP(i) = abs(Psfr(i,ct) - Psfr(i,(ct-1))) !SHOULD BE ct-1!!!! end do DPMAX = maxval(DeltaP) end do sumdt = sumdt + dtn dtn = dtn*(DPLIM/DPMAX) if ((sumdt+dtn > tp) .AND. (sumdt /= tp)) then dtn = tp - sumdt end if BMnn = 0.0_dp !initializing Cij*Vij Matrix to zeros do h = 1,(Nz*Nr) BMnn(h,h) = ((1.0_dp)/(dtn))*B(h) !initializing diagonal of Cij*Vij Matrix AS IF no Wellbore Storage is considered end do if (ws == 0) then BMnn = BMnn !keeping CijVij Matrix as if no wellbore storage, because wellbore storage indicator is disabled (ws = 0) else if (ws == 1) then do i = 1,Nz if ((dzgp(i) < (czp-(dzp/2.0_dp))) .OR. (dzgp(i) > (czp+(dzp/2.0_dp)))) then WSv(i) = 0.0_dp else if (i == 1) then WSv(i) = (C*(dztr(1)-dzgp(1)))/(dzp) else if (i == Nz) then WSv(i) = (C*(dzgp(Nz)-dztr(Nz-1)))/(dzp) else WSv(i) = (C*(dztr(i)-dztr(i-1)))/(dzp) end if end if end do do i = 1,Nz BMnn(i,i) = ((B(i)) + (WSv(i)))/(dtn) !modifying elements 1 to Nz (grid blocks right next to the well) of Cij*Vij Matrix end do end if ATMnn = TMnn + BMnn !Pardiso if (ParTLapF == .false.) then BTMnn = 0.0_dp do j = 1,(Nz*Nr) do i = max(1,(j-Nz)),min((Nz*Nr),(j+Nz)) BTMnn((Nz+Nz+1+i-j),j) = ATMnn(i,j) end do end do BTMnni = BTMnn end if RHSnn(:,1) = matmul(BMnn,Psfr(:,ct)) - NNSSV if (ParTLapF) then cta = 0 ctt = 0 do i = 1,(Nz*Nr) do j = 1,(Nz*Nr) if ((j >= i) .AND. (abs(ATMnn(i,j)) > 0.0)) then ctt = ctt + 1 nzele(ctt) = ATMnn(i,j) end if end do end do end if !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! end if deallocate(tv, STAT = DeAllocateStatus) allocate(tv(ct), STAT = AllocateStatus) if (AllocateStatus /=0 ) STOP "*** Not enough memory to allocate single flow rate time vector during solver while loop, if N-N ***" tv(ct) = sumdt do i = 1,(ct-1) tv(i) = tvtemp(i) end do deallocate(tvtemp, STAT = DeAllocateStatus) allocate(tvtemp(ct), STAT = AllocateStatus) if (AllocateStatus /= 0) STOP "*** Not enough memory to allocate single flow rate temporal time vector during solver while loop, if N-N ***" tvtemp = tv print *, sumdt, 'sec of ', tp, 'sec' print *, Psfr(idx2,ct)*Pa2psi, 'psi', ' at center of packer' end do tv(1) = 0.0_dp

Link Copied

1 Reply

Frank_M

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-14-2015
06:56 PM

74 Views

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.