strange change in numerical results with IVF 19.1 versus 13.0

Brian_Murphy · ‎12-17-2020

I don't know if this forum or the MKL forum is the best place for this. But I will try here first.

I have a code which uses MKL Pardiso with an Arnoldi sparse matrix eigensolver (ARPACK). I have ported the code from IVF 13.0 to 19.1. In running a benchmark case, the results change a tiny bit, but that is ok because I expect that. However, the Arnoldi eigensolver is passed a true/false flag to tell it to return eigenvectors in addition to eigenvalues. 13.0 returns the same eigenvalues with true or false, but 19.1 does not. With 19.1 there is a small but nonetheless very important change in eigenvalues caused by this flag. This happens with both Release and Debug builds. Since Debug has optimizations disabled (/Od), I think that rules out optimizers as the cause.

I guess I'm looking for insight or suggestions into why a change in the compiler and/or MKL library version might cause this. Debugging my way through the ARPACK code to find where things go astray could be a big job.

Steve_Lionel · ‎12-17-2020

Start here: Improving Numerical Reproducibility in C/C++/Fortran

Brian_Murphy · ‎12-17-2020

Results might be going astray in LAPACK routine DLAHQR. The program made with ivf 13.0 uses lapack 2.0 rather than drawing lapack from MKL. The program made with ivf 19.1 is using MKL's lapack. Are you aware of any reason this routine would perform differently? i.e. 2.0 versus today's MKL.

The problem isn't different answers. The problem is two ways of running the program which should produce identical eigenvalues does so with ivf 13.0 but not with ivf 19.1.

Brian_Murphy · ‎12-17-2020

lapack's dlamch('Epsilon-Machine') does not return the same value in ivf 13.0 and 19.0. What sort of compiler options will or should make them match?

andrew_4619 · ‎12-17-2020

dlamch('E') will return epsilon(1.0d0) or epsilon(1.0d0)*0.5 I think.

is epsilon(1.0d0) the same on both versions?

Steve_Lionel · ‎12-17-2020

Likely you can't do that.

mecej4 · ‎12-17-2020

I ran into this problem many years ago. The Netlib codes for machine FPU parameters are not always successful in finding correct values of those parameters. In fact, Fortran-90 provided new intrinsics such as EPSILON() to address that problem.

The following program, run with MKL, gives half of machine epsilon, as Andrew noted, but the desired value was machine epsilon.

program xdlamch
implicit none
double precision, external :: dlamch
print '(1x,2ES22.13)',dlamch('E'),epsilon(0d0)
end

If you use the same program, but use the dlamch.f from Netlib, instead of MKL, the program when compiled with the most recent ifort gives machine epsilon, as desired, but the program compiled with ifx goes into an infinite loop (Apple's campus, by the way, is at 1 Infinite Loop in Cupertino, CA, some distance away from Fortran Drive in San Jose, CA).

Solution: replace calls to obsolete routines such as dlamch, d1mach, etc., by the Fortran 90 intrinsics such as EPSILON().

Brian_Murphy · ‎12-18-2020

I think my problem may be ivf 19.1 MKL libraries. When I build in MKL libraries statically, my code at least runs, even though I don't like the answers. When I build MKL in dynamically, my code crashes on the first call to MKL, which is a call of mkl_dcsrcoo.

What would I have to do to replace today's MKL with MKL from ivf 13.0?

Brian_Murphy · ‎12-18-2020

The problem is not MKL. I made a build which does not use MKL. Instead of pardiso it uses UMFPACK, and Lapack 2.0. Even this produces the same inconsistency in results.

The differences might first appear in results returned by lapack's DLAHQR, but I don't know that for sure. I will keep digging.

Brian_Murphy · ‎12-19-2020

I spoke too soon. Turns out the non-MKL build actually does run correctly (I thought it wasn't because I compared the wrong things). So I now know I can get good results with ivf 19.1.

What I would like to do next is make a build which uses MKL for pardiso, but does not use MKL for Lapack's DLAHQR, but I don't know how. Is it possible to do this? If so, how?

andrew_4619 · ‎12-19-2020

Maybe you should make a binary dump to a file of all the inputs to DLAHQR and then make a small test program that reads the inputs calls DLAHQR and dumps outputs. It will then be easy to test/compare different versions with the same data and also others can comment. You then also have the basis of a reproducer if there is some bug....

Brian_Murphy · ‎12-20-2020

I may eventually do that, but first I want to run the my code with it not using MKL's version of the lapack routine DLAHQR.

mecej4 · ‎12-20-2020

AndrewP's suggestion is very good, and can save you a lot of trouble. Here is all that you have to do.

1. Dump the arrays H and Z (these are the 6-th and 12-th arguments to DLAHQR) to an unformatted file, by adding the following lines just before the call to DLAHQR in your rotor vibration program:

         open(77,file='lahqr.bin',form='unformatted',status='replace')
         write(77)ncv
         write(77)workl(iuptri:iuptri+ncv*ncv-1),workl(invsub:invsub+ncv*ncv-1)
         close(77)

The resulting file, lahqr.bin, can be read using code compiled using Intel Fortran as well as Gfortran, and many other Fortran compilers.

2. Here is a short program to read that file, call DLAQR and print the eigenvalues. You can build the program using any Fortran compiler for which you have Lapack and BLAS. There is no dependency on a sparse solver such as Pardiso, UMFPack, etc.

program chkhqr
implicit none
integer i,ncv,info
double precision, allocatable :: wr(:),wi(:),h(:,:),z(:,:)
open(77,file='lahqr.bin',form='unformatted',status='old')
read(77)ncv
print *,' ncv = ',ncv
allocate(h(ncv,ncv),z(ncv,ncv),wr(ncv),wi(ncv))
read(77)h,z
close(77)
call dlahqr(.false.,.false.,ncv,1,ncv,h,ncv,wr,wi,1,ncv,z,ncv,info)
write(*,*)'Info = ',info
write(*,10)(i,wr(i),wi(i),i=1,ncv)
10 format(i3,2ES20.12)
end program

!file length:
! rec 1 : 12 bytes
! rec 2: 2*8*ncv*ncv+2*4 = 2500*16+8 = 40008  bytes
! total file size 40020 bytes

You can also run this program with the first two arguments to DLAHQR set to .true.

I think that you will find the results to be almost identical, regardless of the values of those two arguments, and regardless of which compiler and Lapack you use. Therefore, the significant eigenvalue differences that you observed have probably more to do with the parts of your large code wherein you compute the elements of the matrices H and Z, rather than with the routine DLAHQR.

Brian_Murphy · ‎12-22-2020

The Arnoldi eigensolver computes eigenvalues plus H & Z matrices. If the user has requested mode shapes, H and Z are subsequently passed to DLAHQR, which extracts eigenvalues and mode shapes. DLAHQR's eigenvalues are identical to Arnoldi's when using Lapack 2.0, but not when using MKL Lapack. The difference is admittedly small in my test cases, but the fact that they are not identical throws a monkey wrench into the flow of the program.

mecej4 · ‎12-22-2020

We could speculate and debate in this vague fashion for a long time, but here is a way to resolving the question more quickly. Provide a test matrix, and a driver that reads the test matrix and calls the various alternative solvers (Arnoldi, Lapack2, Lapack3).

It is not a good idea to try different compilers and compiler options when we are not obtaining consistent results. There cannot be a set of compiler option that will provide magical fixes to bugs in the source code (such as arrays not conforming, overrunning bounds, etc.).

The smaller the matrix, the better, but not so small that the differences in results cannot be observed.