Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

tracking NaN problem

lacek
Beginner
592 Views

Dear All,

I know that to track NaN during runtime, there exist convinient compiler setting:   -check all -traceback -fpe0

my traceback is:

kvec               0000000000899F43  Unknown               Unknown  Unknown
kvec               0000000000837DD7  Unknown               Unknown  Unknown
kvec               00000000008307B0  Unknown               Unknown  Unknown
kvec               00000000007EF235  Unknown               Unknown  Unknown
kvec               00000000007E7A11  Unknown               Unknown  Unknown
kvec               000000000061635E  eigen_mp_ev3_              90  eig.F90
kvec               000000000052EF76  mps_func_mp_mps_r        1813  mp2.F90
kvec               000000000064C5C7  propagate_                893  kvec.F90
kvec               00000000006419E2  MAIN__                    593  kvec.F90
kvec               000000000040B50C  Unknown               Unknown  Unknown
libc.so.6          00000038B722135D  Unknown               Unknown  Unknown
kvec               000000000040B409  Unknown               Unknown  Unknown

the eig.F90 contains

just call zheevr(jobz, range, uplo, n1, a, lda, vl, vu, il, iu, abstol, m1, DD, U, ldz, isuppz, work, lwork, rwork, lrwork, iwork, liwork, info)

So how it is possible that it catches NaN? If I remove the -fpe0, then the zheevr completes and returns with info=0

the matrix a is just :

 (0.499999999999999,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.499999999999999,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.499999999999999,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.707106781186547,0.000000000000000E+000)
 (0.999999999999999,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.888888888888890,0.000000000000000E+000)
 (0.314269680527355,0.000000000000000E+000)
 (0.314269680527355,0.000000000000000E+000)
 (0.444444444444444,0.000000000000000E+000)
 (0.666666666666667,0.000000000000000E+000)
 (0.471404520791032,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.471404520791032,0.000000000000000E+000)
 (1.00000000000000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (0.111111111111111,0.000000000000000E+000)
 (0.157134840263677,0.000000000000000E+000)
 (0.157134840263677,0.000000000000000E+000)
 (0.722222222222222,0.000000000000000E+000)
 (0.166666666666666,0.000000000000000E+000)
 (0.499999999999999,0.000000000000000E+000)
 (0.500000000000000,0.000000000000000E+000)
 (-0.707106781186547,0.000000000000000E+000)
 (0.000000000000000E+000,0.000000000000000E+000)
 (-0.707106781186547,0.000000000000000E+000)
 (0.999999999999999,0.000000000000000E+000)
 (-6.661338147750939E-016,0.000000000000000E+000)
 (5.551115123125783E-017,0.000000000000000E+000)
 (-6.106226635438361E-016,0.000000000000000E+000)
 (0.999999999999999,0.000000000000000E+000)




0 Kudos
8 Replies
Anonymous66
Valued Contributor I
592 Views
-fpe0 turns on Floating-point invalid, divide-by-zero, and overflow exceptions. Underflows are flushed to 0. Without that option, the default is to disable exceptions and floating-point underflow is gradual. This is why you are aborting on the NAN with -fpe0 but the program runs to completion with out it.
0 Kudos
lacek
Beginner
592 Views
Dear Annalee. I understand what fpe0 does, but I do not understand Why does the zheevr (I am using Intels MKL) trigger a NaN catch? The program is completely deterministic, output does not contain any NaN, stat=0, but with I use fpe0 some NaN are catched and seem to originate from MKL routine which - does not contain NaN on input - does not contain NaN on output - terminates with stat=0
0 Kudos
Anonymous66
Valued Contributor I
592 Views
The NAN may occur within the zheevr calculations but not cause the final result to NAN. Alternatively, flush to zero may result in a NAN that does not otherwise occur. If your question is specific to MKL, I would suggest posting on the MKL forum as well. Regards, Annalee
0 Kudos
lacek
Beginner
592 Views
Ok, thanks. I will do that.
0 Kudos
lacek
Beginner
592 Views
I have recompiled file containing the call to zheevr withouf -fpe0 flag and linked it this way to my program, but this did not really help: It seem that NaN trapping is unable to locate a particular line which throws NaN (in my case call zheevr) , but it is able to locate the envelopping routine containing zheevr. So the trigger is 99.9% stil zheevr but not a real source of a problem. I assume this is because the monitoring for NaN is done by observing some processor flags, which are always triggered. Is it possible to change default behaviour of the NaN catching function - make it print a warning but not stop the program?
0 Kudos
Anonymous66
Valued Contributor I
592 Views
There is no way to do that, but you can get more information about where the NAN occurs by compiling with -g as well as -traceback. I would also suggest running it within a debugger.
0 Kudos
mecej4
Honored Contributor III
592 Views
Is it possible to change default behaviour of the NaN catching function - make it print a warning but not stop the program Although one may agree with that wish in principle, there are reasons why it is not practical to implement such a change. For example, what if the number of NaNs caught during a single execution runs in the millions? Does the user want the NaN error reports mixed into the program output? What if the standard output has been redirected to a file?
0 Kudos
lacek
Beginner
592 Views
Ok, I see that this could be a problem. Thanks for comments. Then I guess the simplest option would be to watch variables in the debugger.
0 Kudos
Reply