yes, this looks like a bug

Swartz__Brent · ‎05-23-2018

Using code that worked with previous versions (Intel 12.1 OpenMPI 1.4.4 MKL 10.2.5.035 on CentOS 6) of MKL, zgetrf/zgetri cannot invert a matrix that it previously inverted.

Could the newer intel modules be changing how the arguments are accepted for lapack functions?

The pivot index is determined by the output of zgetrf( .. , ipvt , ..) which is then fed into zgetri as input similarly. I believe this is the way to do it.

I have just tested a sample program that only does the zgetrf and zgetri portions and I believe now that this is where the problem lies. I have tested two sample programs, one with a relevant 8x8 matrix that I calculate in my program, and one made-up example 8x8 matrix that I checked has an inverse.

Both programs run fine using the old modules, compiling with mpif90 etc as I do now when I am taking measurements. I get no runtime errors or error outputs from the zgetrf and zgetri subroutines. However, when using the new modules (intel2018), I do get an error output from the zgetri and zgetrf subroutines for both sample matrices. This error output, labeled "INFO" by default, rerturn a value of 1. To save you some searching, INFO should return 0 if successful (zgetrf factorizes the matrix to be inverted, zgetri actually inverts it). However if INFO = i > 0, then U(i,i) = 0 and the matrix is singular. This means that something that I'm doing with the new modules is causing a verifiable invertible matrix to be considered non-invertible, or singular.

Now, how this causes a segfault in the program at large I'm not 100% sure, it doesn't do so in my small sample program. However it seems very likely that since I do not suppress the output of the subroutines when it returns something other than a successful exit value for INFO, that the output of zgetri in this case is garbage that eventually causes some segfault error when it eventually gets multiplied by other matrices and is solved by zgetrs subroutine later on. I can verify that the output of zgetri when INFO=1 is nonsense from my small tests.

So to recap, I have found that using the new modules causes zgetri and zgetrf subroutines to return an error for the same, invertible matrices. This problem may also occur for the subroutine zgetrs, but I haven't tested it. In any case, the return value after the error then probably causes the segfault. The question remains, why is zgetrf/zgetri not working with the new modules?

Here is sample program (pgrmcheck_cond_zgetri.f90) which reproduces the problem:

! placeholder
!mpiifort -o pgrmcheck.x -O3 pgrmcheck_cond_zgetrf.f90 -i8
!-I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include
!${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -L${MKLROOT}/lib/intel64
!-lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core
!-lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

        PROGRAM dyncond_check
        IMPLICIT NONE
        include 'mpif.h'

        complex,DIMENSION(8) :: rightvec,work
        integer,dimension(8) ::ipvt
        complex,DIMENSION(8,8) :: tmatrixN1
        integer:: error1,error2,i,j

tmatrixN1=0

do i=1,8
do j=1,8

         if (j<(i+1)) tmatrixN1(i,j)=i+(j-1)*8
         enddo
        enddo

call mkl_set_num_threads(1)

        call zgetrf(8,8,tmatrixN1,8,ipvt,error1)
        print *,error1
        call zgetri(8,tmatrixN1,8,ipvt,work,8,error2)
        print *,error2
        do i=1,8
         do j=1,8
!                print *,tmatrixN1(i,j)
         enddo
        enddo

end program

Compiled with:

### Command to compile for intel 2018 modules (Not functioning) ###

module load intel/cluster/2018
mpiifort -o pgrmcheck2.x -O3 pgrmcheck_cond_zgetri.f90 -i8 -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl
./pgrmcheck2.x

###

### Command to compile for (old) intel 12.1 modules (functioning) ###

module load intel/12.1 ompi/1.4.4/intel mkl/10.2.5.035
mpif90 -o pgrmcheck2.x -O3 -r8 pgrmcheck_cond_zgetri.f90 -L/soft/intel/mkl/10.2.1.017/lib/em64t -lmkl_lapack -lmkl_intel_thread -lmkl_core -lguide -lmkl_intel_lp64
./pgrmcheck2.x

###

Thanks!

Gennady_F_Intel · ‎05-24-2018

wrt MKL version 2018: do you see segfault with lp64 API too? Could you please check.

Swartz__Brent · ‎05-30-2018

When I replace all instances of 'ilp64' to plainly 'lp64', It did compile and there were no segfault errors, but all my results return NaN.

Thanks.

Swartz__Brent · ‎06-13-2018

Do you need more info?

Thanks.

Gennady_F_Intel · ‎06-14-2018

I checked the latest MKL 2018 u3 with lp64 and ilp64 aPI. I only added to this code mkl_version subroutine call.

here are output I see on my side:

make lp64

mpiifort -I/opt/intel/compilers_and_libraries_2018/linux/mpi/include64/ test_zgetrf.f -o lp64.out \

-I/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/include \

/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64/libmkl_lapack95_lp64.a -L/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64 \

-lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core \

-lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl

$ ./lp64.out

Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications

1

$ make ilp64

mpiifort -i8 -I/opt/intel/compilers_and_libraries_2018/linux/mpi/include64/ test_zgetrf.f -o ilp64.out \

-I/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/include/intel64/ilp64 -I/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/include \

/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64/libmkl_lapack95_ilp64.a -L/opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64 \

-lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core \

-lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

$ ./ilp64.out

Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications

1

Swartz__Brent · ‎07-17-2018

Just to verify, the info returned (the 1's) indicate failure, since 0 should be returned if it was successful, correct?

If so are you looking into this problem?

Thanks.

Gennady_F_Intel · ‎07-18-2018

yes, this looks like a bug.could you please submit the request to the Inlel online service center -https://supporttickets.intel.com/servicecenter

zgetrf/zgetri cannot invert matrix