Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29230 Discussions

eigensolver code crashes on AMD processor

Brian_Murphy
New Contributor II
10,499 Views

I am using pardiso with ARPACK's Arnoldi eigensolver.  The code has been in use by over 100 users for several years.  I'm getting reports that the code crashes on AMD Ryzen systems.  Is there anything in particular that might be causing this?  I sent a DEBUG build to a user, but this didn't reveal anything as it simply crashed in identical fashion with no messages.

I'm using visual studio 2019.  My ifort compiler command line is as follows:

/nologo
/O2
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\LAPACK\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\Umfpack\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\BLAS\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\UTIL\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\SRC\x64\Debug"
/extend_source:132
/module:"x64\Release\\"
/object:"x64\Release\\"
/Fd"x64\Release\\vc160.pdb"
/libs:static
/threads
/c

I've read about a compiler option /Qimf-arch-consistency in this thread.  If I try this option, should I set it to true or false?

Thanks,

Brian Murphy

0 Kudos
1 Solution
Brian_Murphy
New Contributor II
9,139 Views

I am happy to report that my user with a Ryzen 9 7950X has reported that the crash was eliminated with the myMKL_x64.DLL built with IVF 19.1.

In addition, my user with a Ryzen 7 PRO 5875U has reported the same success.

View solution in original post

50 Replies
Steve_Lionel
Honored Contributor III
1,827 Views

The error reported was an illegal instruction, not access violation.

0 Kudos
mecej4
Honored Contributor III
1,814 Views

Thanks, I have corrected the previous post.

0 Kudos
Brian_Murphy
New Contributor II
1,797 Views

Can you please give an example of an "illegal instruction"?

Elsewhere - I tested a build of my program with today's IVF & MKL (i.e. not my custom MKL DLL), and the Arnoldi eigensolver still does not work like the older MKL (i.e. does not produce the same eigenvalues with and without the inclusion of eigenvectors).  Bummer.

0 Kudos
andrew_4619
Honored Contributor III
1,788 Views

Just to be clear you are saying there is a bug in the current MKL routines? Is that a known bug if not you should post a reproducer in the mkl forum.

 

0 Kudos
mecej4
Honored Contributor III
1,779 Views

If you get an "Illegal Instruction" abort when running a program that was compiled from a high level programming language, as explained in this old forum thread, it is the equivalent of "the barbarians have crossed the gates and are in". The causes can be many; the compiler, the RTL and the OS try hard to catch the error earlier, which is why we rarely see this message. As explained in this old thread in this forum, the cause is probably stack corruption.

There can be many causes of stack corruption, and it is not useful to use a magnifying glass to look at the actual illegal instruction at the machine code level. You have to create a small program that demonstrates the error, and provide it here.

Similarly,

[Forum managers:  Problems related to old forum posts that need fixing:

1. Many links in old forum posts no longer work, and may take one to an irrelevant Intel web page. The link to Dr. Fortran's "Don't blow your stack" article,  given in Kevin Davis's post , for example, is bad!

2. In a thread with many posts such as the current thread (over 40 here), it is almost necessary to have post reference numbers. The older version of this forum allowed one to see "#14" as the sequence number of the fourteenth post in the thread, and these sequence numbers made navigation painless. Many old posts in this Forum still contain such sequence numbers in the bodies of posts, but no such reference number is available in the initial lines of each post. Try to find post #13 in the current thread, for example, and note that different readers may read different time stamps for one and the same post, depending on their profile timezone setting.]

0 Kudos
Steve_Lionel
Honored Contributor III
1,762 Views

Re: "illegal instruction"

It is common for new generations of processors to introduce new sets of instructions, where it has been found that common operations formerly done with sequences of instructions can be done faster with a new one. Both Intel and AMD have done this over the years. but more recently AMD just adopts new instruction sets Intel introduces. On the Intel side, there were SSE2, SSE3, SSE4, AVX, AVX2, and AVX512. Intel has more recently added smaller subsets of instructions, not enabled on all processors. If the CPU happens along an instruction it doesn't support, you get this error.

This case is a bit weird, though, in that an older processor runs the code and some newer ones don't.  It may be 1) the newer processors don't support a particular instruction correctly, or 2) the program is jumping into something that isn't a valid instruction. If the error is reliably reproducible, it should be possible to first identify the particular instruction it is complaining about, see if it is supposed to be valid, and if not, figure out how it got there. 

I'm leaning towards choice 2 here, as I am fairly confident that MKL doesn't do CPU dispatch for AMD processors.

0 Kudos
Brian_Murphy
New Contributor II
1,751 Views

In response to Andrew regarding "is it a possible bug in MKL".  The change in behavior being a bug could be considered a matter of opinion (or a case of splitting hairs). 

When Arnoldi finishes its iteration for finding eigenvalues, it calls a wrapup routine (dneupd) to prepare the evalues for return to the calling program.  If evectors have not been requested, the evalues are simply copied from work arrays to calling arguments.  If evectors are requested, additional work is done by dneupd to prepare the evectors, and to do this it calls a LAPACK routine named DLAHQR.  The behavior of DLAHQR is where the difference comes in between old and new versions.  If I'm right about that, this is really about LAPACK rather than MKL.  But I'm not totally sure.  DLAHQR recomputes the evalues from a hessenberg matrix.  In the old version of DLAHQR, the recomputed evalues exactly match the evalues determined by Arnoldi iteration, but not so with the new version of DLAHQR.  In the big picture, the differences in evalues are small, but it fouls up other logic used elsewhere in my program.

The source codes of the old and new DLAHQR have way too many differences for me to tell what happened.

0 Kudos
Brian_Murphy
New Contributor II
1,750 Views

A possible solution to my AMD crashing problem may be to simply build myMKL_x64.DLL with a newer version of IVF&MKL.  I've done this with Intel® Visual Fortran Compiler – extension version 19.1.0057.16, Package ID: w_comp_lib_2020.2.254, and sent the DLL to a user for testing.

I have IVF 2023.2 on another development system, but I need help figuring out how to build myMKL_x64.DLL on that system.  I will use an earlier thread for that in the MKL forum. 

0 Kudos
Brian_Murphy
New Contributor II
9,140 Views

I am happy to report that my user with a Ryzen 9 7950X has reported that the crash was eliminated with the myMKL_x64.DLL built with IVF 19.1.

In addition, my user with a Ryzen 7 PRO 5875U has reported the same success.

Reply