Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

error while using zgetri

Tai_Q_
Beginner
667 Views

Dear all,

I am running a program that has been running many times in a cluster.

Maybe because the cluster has been through software upgrade, there are errors while running the executable file a.out.

There is no problem for compiling and linking. Just error will show up while run the program halfway..


forrtl: error (65): floating invalid
Image              PC                Routine            Line        Source
libifcoremt.so.5   00002B6454D7A6D4  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B6452C20370  Unknown               Unknown  Unknown
libmkl_avx512_mic  00002B646F5370BE  mkl_blas_avx512_m     Unknown  Unknown
libmkl_avx512_mic  00002B646F544B61  mkl_blas_avx512_m     Unknown  Unknown
libmkl_avx512_mic  00002B646F541935  mkl_blas_avx512_m     Unknown  Unknown
libmkl_intel_thre  00002B644F2FF714  mkl_blas_ztrsm_ho     Unknown  Unknown
libmkl_intel_thre  00002B644F319606  mkl_blas_ztrsm        Unknown  Unknown
libmkl_core.so     00002B64515C1F74  mkl_lapack_ztrtri     Unknown  Unknown
libmkl_core.so     00002B64514B032C  mkl_lapack_zgetri     Unknown  Unknown
libmkl_intel_lp64  00002B644E98683D  ZGETRI                Unknown  Unknown

Now we are using intel/17.0.4, impi/17.0.3.

      call ZGETRF( N_LEN_2, N_LEN_2, BQ , N_LEN_2, IPIV , INFO )
 
      call ZGETRI( N_LEN_2, BQ, N_LEN_2, IPIV, WORK, N_LEN_2, INFO )

The first subroutine

ZGETRF

is fine. But when it comes to the second function

ZGETRI. There is always a floating invalid error. 

I just do not understand. Because the input of ZGETRI are just the output of ZGTRF.

*********updates********

I found the following on Intel® Math Kernel Library (Intel® MKL) 2017 Release Notes

Fixed irregular division by zero and invalid floating point exceptions 
in {C/Z}TRSM for Intel® Xeon Phi™ processor x200 (aka KNL) and Intel® Xeon® 
Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) code path

I found this maybe useful because my error message just mentioned 

TRSM, Invalid floating AVX-512

********updates2********

It seems the error has something to do with the MKL library.

1. The code has been running for a long time.

2. I run the code in a low version MKL library, it works well.

I think the current MKL library which is 17.0.4 must has something not correct.

 

 

0 Kudos
3 Replies
Gennady_F_Intel
Moderator
667 Views

hello, thanks for report.

1. what is the current version you are use? could you look at mkl_version.h file? and 

2  >>>  I run the code in a low version MKL library, it works well.

what is the previous version which works well?

3. Could you give us the reproducer? 

--Gennady

0 Kudos
Tai_Q_
Beginner
667 Views

Hi Gennady,

Those days I have been debugging my code.

As I said, this code has been used for many years.

Previous intel mkl version is 16.0.3, now is 17.0.4 (17.0.1 is also fine for this code).

Even running the same case as before will produce errors.

forrtl: error (65): floating invalid
Image              PC                Routine            Line        Source
libifcoremt.so.5   00002B6454D7A6D4  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B6452C20370  Unknown               Unknown  Unknown
libmkl_avx512_mic  00002B646F5370BE  mkl_blas_avx512_m     Unknown  Unknown
libmkl_avx512_mic  00002B646F544B61  mkl_blas_avx512_m     Unknown  Unknown
libmkl_avx512_mic  00002B646F541935  mkl_blas_avx512_m     Unknown  Unknown
libmkl_intel_thre  00002B644F2FF714  mkl_blas_ztrsm_ho     Unknown  Unknown
libmkl_intel_thre  00002B644F319606  mkl_blas_ztrsm        Unknown  Unknown
libmkl_core.so     00002B64515C1F74  mkl_lapack_ztrtri     Unknown  Unknown
libmkl_core.so     00002B64514B032C  mkl_lapack_zgetri     Unknown  Unknown
libmkl_intel_lp64  00002B644E98683D  ZGETRI                Unknown  Unknown

I think the 17.0.4 must have some changes in dealing with complex numbers.

 

Thanks!

Tai

 

0 Kudos
Gennady_F_Intel
Moderator
667 Views
Hi Tai,
could you try :
1. to check the input paramaters of zgetri by setting verbose mode MKL_VERBOSE=1 
2. to check if the problem is caused by multithreading 
call mkl_set_num_threads(1)
        call zgetri(……)
// and then try to restore the original # of threads
call mkl_set_num_threads(#orig num of threads)
3. Could you captured the input data of zgetri and create the stanalone reproducer based on these data?
 
--Gennady
0 Kudos
Reply