Solved: MKL_PARDISO in os x

ericcp_dias_ie · ‎03-26-2014

Hello, I am trying to solve a sparse linear system in fortran 90 using the mkl pardiso solver as in the following code:



        do i = 1, 64
            iparm(i) = 0
        end do

        error = 0 ! initialize error flag
        msglvl = 1 ! print statistical information
        mtype = 13 ! complex unsymmetric
        !C.. Initiliaze the internal solver memory pointer. This is only
        !C necessary for the FIRST call of the PARDISO solver.
        do i = 1, 64
           pt( i )%DUMMY =  0 
        end do
        
        maxfct = 1
        mnum = 1
        nrhs = 1
        !C.. Reordering and Symbolic Factorization, This step also allocates
        !C all memory that is necessary for the factorization
        phase = 11 ! only reordering and symbolic factorization
        print*, ' calling sparse solver'

        CALL PARDISO (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns, &
             perm, nrhs, iparm, msglvl, b, Ex, error)
        WRITE(*,*) 'Reordering completed ... '

My code compiles (intel compiler ifort (IFORT) 12.1.3 20120130) without problem but once reached the solver call the program stops and I have to kill it.

The output on my screen is, before the program behave strangely, just

calling sparse solver

I've also tried with different parameters in iparm, as per the examples folder, but without different results.

I also print out the matrix, to be sure is in the correct form.

double complex values:(-1.000000,0.0000000E+00) (1.000000,0.0000000E+00) (0.5000000,0.0000000E+00)
(-1.500000,0.0000000E+00) (1.000000,0.0000000E+00) (-0.2222222,0.0000000E+00)
(-0.1111111,0.0000000E+00) (0.3333333,0.0000000E+00) (1.000000,0.0000000E+00)

integer rowIndex: 1 3 6 9 10

integer columns: 1 2 1 2 3 2 3 4 4

b is imaginary and the system has a solution.

Any suggestion?

mecej4 · ‎03-27-2014

I think that you have errors related to a misunderstanding of the nature of the intrinsic function CMPLX. Because kind(cmplx(1.0d0,1.0d0)) evaluates to 4, whereas kind(cmplx(1.0d0,1.0d0,kind=kind(1d0))) evaluates to 8, the latter of which you probably want, many errors arise. Note, as well, that Pardiso will not work correctly with default precision real and complex variables unless you set iparm(28) = 1. You need REAL(8) and COMPLEX(8) variables if you set the default value of zero in iparm(28).

Try the following program to see the implications:

[fortran]

program chkcmplx

implicit none

complex(8) :: i1

i1=cmplx(0d0,1d-1)

write(*,*)i1

i1=cmplx(0d0,1d-1,kind(0d0))

write(*,*)i1

write(*,*)kind(cmplx(1d0,1d0)),kind(cmplx(1.0d0,1.0d0,kind=kind(1d0)))

end [/fortran]

I have attached a zip file with corrected sources. The program built with these sources worked correctly with IFort 11.1.070 and 14.0.2, on Windows

View solution in original post

Gennady_F_Intel · ‎03-26-2014

into the code you show - all iparm are zero. Would you please give the example which we can compile and check the problem on our side?

please use iparm(27) = 1 to check the input matrix format.

ericcp_dias_ie · ‎03-26-2014

It is a bit long, I've put everything here

include 'mkl_pardiso.f90'
module d1isoModel
    use kinddef
    USE mkl_pardiso 
    implicit none
    type :: d1isoGrid
        private
        real(rk), dimension(:), allocatable :: z
        real(rk), dimension(:), allocatable :: sigma
        integer(ik):: n
    contains
        procedure, public :: init => initd1isoModel
        procedure, public :: printOnScreen => printd1isoModelOnScreen
        procedure, public :: computeEx
        procedure, public :: deallocate => clean_d1isoGrid
    end type d1isoGrid
contains
 subroutine initd1isoModel(this,n,z,sigma)
        class(d1isoGrid) :: this
        integer(ik) :: n
        real(rk), dimension(n) :: z
        real(rk), dimension(n-1) :: sigma
        allocate(this%z(n))
        allocate(this%sigma(n-1))

        this%z = z
        this%sigma = sigma
        this%n = n
    end subroutine initd1isoModel

    subroutine clean_d1isoGrid(this)
        class(d1isoGrid) :: this
        if (allocated(this%sigma))then
            deallocate(this%sigma)
        end if

        if (allocated(this%z))then
            deallocate(this%z)
        end if
        print*, 'memory used for d1isoGrid relased ...'
    end subroutine clean_d1isoGrid

 function computeEx(this,freq,Ex) result(info)
      
      use physicsparameters
        class(d1isoGrid), intent(in) :: this
        real(rk), intent(in) :: freq
        complex(ck), dimension(:), allocatable :: Ex
        integer :: info
        ! pardiso header file, parameters and variables, based on the intel example
        TYPE(MKL_PARDISO_HANDLE), dimension(64)  :: pt
        INTEGER :: iparm(64)
        INTEGER :: i, idum
        complex(rk) :: ddum!, waltime1, waltime2
        INTEGER :: maxfct, mnum, mtype, phase, nPardiso, nrhs, error, msglvl


        ! local arguments!
        real(rk) :: omega
        real(rk), dimension(this%n -3) :: znodes, C
        integer  :: j
!        integer (ik) ::  icol, irow
        integer  :: n
        integer  :: elementIndex
        real(rk) :: zGhostNode
        complex (ck), dimension ((this%n -3)*3 + 2 + 1) :: values
        integer , dimension ((this%n -3)*3 + 2 + 1) :: columns
        integer , dimension (this%n) :: rowIndex
        complex (ck), dimension(this%n-1) :: b
        integer, dimension(this%n -1) :: perm
        n = this%n
        nPardiso = n-1

        ! compute znodes coordiantes
        do i = 2,n
            znodes(i-1) = (this%z(i) + this%z(i-1))*0.5_rk
        end do
        omega = 2*pi*freq
        allocate (Ex(this%n -1)) ! columns store respectively Ex, Ey and Ez.

        b = cmplx(0._rk,0._rk)

        b(1) = omega*mu0*i1

        C = computeC(n,this%z,znodes)
        !print*, znodes
        !print*, C

        zGhostNode = znodes(1) - (znodes(2) - znodes(1))
        !        print*, zGhostNode, znodes

        print*, 'forming the finite-differences sparse matrix and r.h.s. ...'
        values(1) = -((znodes(1) - zGhostNode)**(-1))
        columns(1) = 1
        rowIndex(1) = 1
        values(2) = ((znodes(1) - zGhostNode)**(-1))
        columns(2) = 2
        elementIndex = 2
        !print*, 'n = ', n

        do i = 2,n-2
            do j = -1,1
                elementIndex = elementIndex+1
                select case (j)
                    case(-1)
                        values(elementIndex) = C(i-1)*(znodes(i+1)-znodes(i))
                        !print*, c(i), values(elementIndex)
                        columns(elementIndex) = i+j
                        rowIndex(i) = elementIndex
                    case(0)
                        values(elementIndex) = -C(i-1)*(znodes(i+1)-znodes(i-1))
                        columns(elementIndex) = i+j
                    case(1)
                        values(elementIndex) = C(i-1)*(znodes(i)-znodes(i-1))
                        columns(elementIndex) = i+j
                end select
            end do
        end do
        elementIndex = elementIndex+1
        values(elementIndex) = 1._rk
        columns(elementIndex) = n-1
        rowIndex(n-1) = elementIndex
        rowIndex(n) = size(values)+1


        print*, values
        print*, columns
        print*, rowIndex

        ! calling the sparse solver
        !C..
        !C.. Set up PARDISO control parameter
        !C..
        do i = 1, 64
            iparm(i) = 0
        end do
!!$        iparm(1) = 0 ! no solver default
!!$        iparm(2) = 2 ! uses openMP
!!$        iparm(3) = 0 ! numbers of processors
!!$        iparm(4) = 62 ! no iterative-direct algorithm
!!$        iparm(5) = 0 ! no user fill-in reducing permutation
!!$        iparm(6) = 0 ! =0 solution on the first n compoments of x
!!$        iparm(7) = 0 ! not in use
!!$        iparm(8) = 0 ! numbers of iterative refinement steps
!!$        iparm(9) = 0 ! not in use
!!$        iparm(10) = 13 ! perturbe the pivot elements with 1E-13
!!$        iparm(11) = 1 ! use nonsymmetric permutation and scaling MPS
!!$        iparm(12) = 0 ! not in use
!!$        iparm(13) = 1 ! not in use
!!$        iparm(14) = 0 ! Output: number of perturbed pivots
!!$        iparm(15) = 0 ! not in use
!!$        iparm(16) = 0 ! not in use
!!$        iparm(17) = 0 ! not in use
!!$        iparm(18) = -1 ! Output: number of nonzeros in the factor LU
!!$        iparm(19) = -1 ! Output: Mflops for LU factorization
!!$        iparm(20) = 0 ! Output: Numbers of CG Iterations
        iparm(27) = 1
        error = 0 ! initialize error flag
        msglvl = 1 ! print statistical information
        mtype = 13 ! complex unsymmetric
        !C.. Initiliaze the internal solver memory pointer. This is only
        !C necessary for the FIRST call of the PARDISO solver.
        do i = 1, 64
           pt( i )%DUMMY =  0 
        end do
        
        maxfct = 1
        mnum = 1
        nrhs = 1
        !C.. Reordering and Symbolic Factorization, This step also allocates
        !C all memory that is necessary for the factorization
        phase = 11 ! only reordering and symbolic factorization
        print*, ' calling sparse solver'

        CALL PARDISO (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns, &
             perm, nrhs, iparm, msglvl, b, Ex, error)
        WRITE(*,*) 'Reordering completed ... '
        IF (error .NE. 0) THEN
            WRITE(*,*) 'The following ERROR was detected: ', error
            STOP
        END IF
        WRITE(*,*) 'Number of nonzeros in factors = ',iparm(18)
        WRITE(*,*) 'Number of factorization MFLOPS = ',iparm(19)
        !C.. Factorization.
        phase = 22 ! only factorization
        CALL pardiso (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns &
            , perm, nrhs, iparm, msglvl, b, Ex, error)
        WRITE(*,*) 'Factorization completed ... '
        IF (error .NE. 0) THEN
            WRITE(*,*) 'The following ERROR was detected: ', error
            STOP
        ENDIF
        !C.. Back substitution and iterative refinement
        iparm(8) = 2 ! max numbers of iterative refinement steps
        phase = 33 ! only factorization

        CALL pardiso (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns &
            , perm, nrhs, iparm, msglvl, b, Ex, error)
        WRITE(*,*) 'Solve completed ... '
        WRITE(*,*) 'The solution of the system is '
        DO i = 1, n
            WRITE(*,*) ' Ex(',i,') = ', Ex(i)
        END DO
        !C.. Termination and release of memory
        phase = -1 ! release internal memory
        CALL pardiso (pt, maxfct, mnum, mtype, phase, nPardiso, values, rowIndex, columns &
            , perm, nrhs, iparm, msglvl, b, Ex, error)
        info = error
        return
    end function computeEx

    function computeC(n,z,znodes) result(C)
        integer(ik), intent(in) :: n
        real(rk), dimension(n) :: z
        real(rk), dimension(n-1) :: znodes
        real(rk), dimension(2:n-2) :: C
        integer(ik) :: i
        !C = 0._rk
        !print*, 'c ='
        !print*, C

        do i = 2,n-2
            C(i) = (z(i+1)-z(i))**(-1) * ((znodes(i+1)-znodes(i))*(znodes(i)-znodes(i-1)))**(-1)
        end do
        !print*, 'c ='
        !print*, C
        return
    end function computeC

end module d1isoModel

and the supporting modules

module kinddef
    implicit none
    integer, parameter :: ik = kind(1)
    integer, parameter :: rk = kind(1.0d0)
    integer, parameter :: ck = kind(cmplx(1.0d0,1.0d0))
    save
end module

and

module physicsparameters
    use kinddef
    implicit none
    complex(ck), parameter :: i1 = cmplx(0._rk,1._rk)
    real(rk), parameter :: pi = acos(-1._rk)
    real(rk), parameter :: mu0 = 4._rk*pi*1.E-7
    save
end module physicsparameters

I call everything from main

program hello
use kinddef
use d1isoModel
    implicit none
    type(d1isoGrid) :: myGrid
    real(rk), dimension(4) :: s
    real(rk), dimension(5) :: z
    complex(ck), dimension(:), allocatable :: E
    integer(ik) :: info
    z = (/-1., 0., 1., 2., 4./)
    s = (/0., 0.001, .001, .1/)

    call myGrid%init(5,z,s)
    call myGrid%printOnScreen()
    info = computeEx(myGrid,3._rk,E)
    print*, 'info = ', info
    call myGrid%deallocate()
    deallocate(E)
end program

I use the flags

ifort -o test *.o -i8 -openmp -I$MKLROOT/include $MKLROOT/lib/libmkl_intel_ilp64.a $MKLROOT/lib/libmkl_core.a $MKLROOT/lib/libmkl_intel_thread.a -lpthread -l

to link

mecej4 · ‎03-26-2014

Incomplete, can't compile because source code of printd1isoModelOnScreen is not provided.

ericcp_dias_ie · ‎03-26-2014

yes, you are right. you can comment the subroutine call and definition out as it is inessential to the mkl_pardiso problem. It only print sigma and z on screen for testing

mecej4 · ‎03-26-2014

I found a couple of array bounds errors in your program. In Function ComputeEx, you need dimension znodes(this%n-1) rather than (this%n-3).

Secondly, I am not familiar with the F95 interface to Pardiso, so I replaced the F95 style calls with F77 calls. The changes can be found in the attached file. With these changes, your program ran to completion, with the solution

[bash]

Ex( 1 ) = (0.000000000000000E+000,-8.290467793798273E-005)
Ex( 2 ) = (0.000000000000000E+000,-5.921762709855910E-005)
Ex( 3 ) = (0.000000000000000E+000,-3.553057625913546E-005)
Ex( 4 ) = (0.000000000000000E+000,0.000000000000000E+000)

[/bash]

I think that if my assessment is correct, you can simplify the program by avoiding complex numbers. The assessment is based on observing that the matrix is real, and the R.H.S. vector b is purely imaginary. Therefore, instead of solving A x = b = i b_r, you can solve A y = b_r, and then take the solution as x = i y. If you can take advantage of this observation, you may pick up considerable improvement in execution speed.

ericcp_dias_ie · ‎03-27-2014

I had some improvements on my system using the provided file.

At the moment I can compile and run the code without interruption if I link the libraries dynamically.

Nevertheless I obtain a sad set of (0., NaN) result.

ericmacbook:d1iso eric$ ./test 
 forming the finite-differences sparse matrix and r.h.s. ...
 (-1.000000,0.0000000E+00) (1.000000,0.0000000E+00) (1.000000,0.0000000E+00)
 (-2.000000,0.0000000E+00) (1.000000,0.0000000E+00) (1.000000,0.0000000E+00)
 (-1.666667,0.0000000E+00) (0.6666667,0.0000000E+00) (1.000000,0.0000000E+00)
           1           2           1           2           3           2
           3           4           4
           1           3           6           9          10
  calling sparse solver

=== PARDISO: solving a complex nonsymetric system ===
The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON


Summary: ( reordering phase )
================

Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.000010 s
Time spent in reordering of the initial matrix (reorder)         : 0.000123 s
Time spent in symbolic factorization (symbfct)                   : 0.000026 s
Time spent in allocation of internal data structures (malloc)    : 0.000267 s
Time spent in additional calculations                            : 0.000017 s
Total time spent                                                 : 0.000443 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b >
             number of equations:           4
             number of non-zeros in A:      9
             number of non-zeros in A (): 56.250000

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    2
             size of largest supernode:               2
             number of non-zeros in L:                10
             number of non-zeros in U:                2
             number of non-zeros in L+U:              12
 Reordering completed ... 
 Number of nonzeros in factors =           12
 Number of factorization MFLOPS =            0
=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
 50 %  90 %  100 % 

=== PARDISO: solving a complex nonsymetric system ===
Single-level factorization algorithm is turned ON


Summary: ( factorization phase )
================

Times:
======
Time spent in copying matrix to internal data structure (A to LU): 0.000000 s
Time spent in factorization step (numfct)                        : 0.001583 s
Time spent in allocation of internal data structures (malloc)    : 0.000015 s
Time spent in additional calculations                            : 0.000002 s
Total time spent                                                 : 0.001600 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b >
             number of equations:           4
             number of non-zeros in A:      9
             number of non-zeros in A (): 56.250000

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    2
             size of largest supernode:               2
             number of non-zeros in L:                10
             number of non-zeros in U:                2
             number of non-zeros in L+U:              12
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000051

 Factorization completed ... 

=== PARDISO: solving a complex nonsymetric system ===


Summary: ( solution phase )
================

Times:
======
Time spent in direct solver at solve step (solve)                : 0.000097 s
Time spent in additional calculations                            : 0.000038 s
Total time spent                                                 : 0.000135 s

Statistics:
===========
< Parallel Direct Factorization with number of processors: > 1
< Numerical Factorization with Level-3 BLAS performance >

< Linear system Ax = b >
             number of equations:           4
             number of non-zeros in A:      9
             number of non-zeros in A (): 56.250000

             number of right-hand sides:    1

< Factors L and U >
             number of columns for each panel: 128
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    2
             size of largest supernode:               2
             number of non-zeros in L:                10
             number of non-zeros in U:                2
             number of non-zeros in L+U:              12
             gflop   for the numerical factorization: 0.000000

             gflop/s for the numerical factorization: 0.000051

 Solve completed ... 
 The solution of the system is 
  Ex(           1 ) =  (0.0000000E+00,NaN)
  Ex(           2 ) =  (0.0000000E+00,NaN)
  Ex(           3 ) =  (0.0000000E+00,NaN)
  Ex(           4 ) =  (0.0000000E+00,NaN)
 info =            0
 memory used for d1isoGrid relased ...
ericmacbook:d1iso eric$

maybe you know any smart debugging flag I can use to individuate/solve the problem?

mecej4 · ‎03-27-2014

I think that you have errors related to a misunderstanding of the nature of the intrinsic function CMPLX. Because kind(cmplx(1.0d0,1.0d0)) evaluates to 4, whereas kind(cmplx(1.0d0,1.0d0,kind=kind(1d0))) evaluates to 8, the latter of which you probably want, many errors arise. Note, as well, that Pardiso will not work correctly with default precision real and complex variables unless you set iparm(28) = 1. You need REAL(8) and COMPLEX(8) variables if you set the default value of zero in iparm(28).

Try the following program to see the implications:

[fortran]

program chkcmplx

implicit none

complex(8) :: i1

i1=cmplx(0d0,1d-1)

write(*,*)i1

i1=cmplx(0d0,1d-1,kind(0d0))

write(*,*)i1

write(*,*)kind(cmplx(1d0,1d0)),kind(cmplx(1.0d0,1.0d0,kind=kind(1d0)))

end [/fortran]

I have attached a zip file with corrected sources. The program built with these sources worked correctly with IFort 11.1.070 and 14.0.2, on Windows

ericcp_dias_ie · ‎03-27-2014

oh, that was enlightening!

Thanks a million for your kind help.

As per the real vs complex you are right again, but both the matrix and the rhs doom is to become complex, so I will not be able to separate them again. If i'll be possible to solve independently for real and imaginary part, I'll keep your tip in mind.

Thanks again.