- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using pardiso with ARPACK's Arnoldi eigensolver. The code has been in use by over 100 users for several years. I'm getting reports that the code crashes on AMD Ryzen systems. Is there anything in particular that might be causing this? I sent a DEBUG build to a user, but this didn't reveal anything as it simply crashed in identical fashion with no messages.
I'm using visual studio 2019. My ifort compiler command line is as follows:
/nologo
/O2
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\LAPACK\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\Umfpack\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\BLAS\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\UTIL\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\SRC\x64\Debug"
/extend_source:132
/module:"x64\Release\\"
/object:"x64\Release\\"
/Fd"x64\Release\\vc160.pdb"
/libs:static
/threads
/c
I've read about a compiler option /Qimf-arch-consistency in this thread. If I try this option, should I set it to true or false?
Thanks,
Brian Murphy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am happy to report that my user with a Ryzen 9 7950X has reported that the crash was eliminated with the myMKL_x64.DLL built with IVF 19.1.
In addition, my user with a Ryzen 7 PRO 5875U has reported the same success.
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error reported was an illegal instruction, not access violation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please give an example of an "illegal instruction"?
Elsewhere - I tested a build of my program with today's IVF & MKL (i.e. not my custom MKL DLL), and the Arnoldi eigensolver still does not work like the older MKL (i.e. does not produce the same eigenvalues with and without the inclusion of eigenvectors). Bummer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to be clear you are saying there is a bug in the current MKL routines? Is that a known bug if not you should post a reproducer in the mkl forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you get an "Illegal Instruction" abort when running a program that was compiled from a high level programming language, as explained in this old forum thread, it is the equivalent of "the barbarians have crossed the gates and are in". The causes can be many; the compiler, the RTL and the OS try hard to catch the error earlier, which is why we rarely see this message. As explained in this old thread in this forum, the cause is probably stack corruption.
There can be many causes of stack corruption, and it is not useful to use a magnifying glass to look at the actual illegal instruction at the machine code level. You have to create a small program that demonstrates the error, and provide it here.
Similarly,
[Forum managers: Problems related to old forum posts that need fixing:
1. Many links in old forum posts no longer work, and may take one to an irrelevant Intel web page. The link to Dr. Fortran's "Don't blow your stack" article, given in Kevin Davis's post , for example, is bad!
2. In a thread with many posts such as the current thread (over 40 here), it is almost necessary to have post reference numbers. The older version of this forum allowed one to see "#14" as the sequence number of the fourteenth post in the thread, and these sequence numbers made navigation painless. Many old posts in this Forum still contain such sequence numbers in the bodies of posts, but no such reference number is available in the initial lines of each post. Try to find post #13 in the current thread, for example, and note that different readers may read different time stamps for one and the same post, depending on their profile timezone setting.]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: "illegal instruction"
It is common for new generations of processors to introduce new sets of instructions, where it has been found that common operations formerly done with sequences of instructions can be done faster with a new one. Both Intel and AMD have done this over the years. but more recently AMD just adopts new instruction sets Intel introduces. On the Intel side, there were SSE2, SSE3, SSE4, AVX, AVX2, and AVX512. Intel has more recently added smaller subsets of instructions, not enabled on all processors. If the CPU happens along an instruction it doesn't support, you get this error.
This case is a bit weird, though, in that an older processor runs the code and some newer ones don't. It may be 1) the newer processors don't support a particular instruction correctly, or 2) the program is jumping into something that isn't a valid instruction. If the error is reliably reproducible, it should be possible to first identify the particular instruction it is complaining about, see if it is supposed to be valid, and if not, figure out how it got there.
I'm leaning towards choice 2 here, as I am fairly confident that MKL doesn't do CPU dispatch for AMD processors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In response to Andrew regarding "is it a possible bug in MKL". The change in behavior being a bug could be considered a matter of opinion (or a case of splitting hairs).
When Arnoldi finishes its iteration for finding eigenvalues, it calls a wrapup routine (dneupd) to prepare the evalues for return to the calling program. If evectors have not been requested, the evalues are simply copied from work arrays to calling arguments. If evectors are requested, additional work is done by dneupd to prepare the evectors, and to do this it calls a LAPACK routine named DLAHQR. The behavior of DLAHQR is where the difference comes in between old and new versions. If I'm right about that, this is really about LAPACK rather than MKL. But I'm not totally sure. DLAHQR recomputes the evalues from a hessenberg matrix. In the old version of DLAHQR, the recomputed evalues exactly match the evalues determined by Arnoldi iteration, but not so with the new version of DLAHQR. In the big picture, the differences in evalues are small, but it fouls up other logic used elsewhere in my program.
The source codes of the old and new DLAHQR have way too many differences for me to tell what happened.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A possible solution to my AMD crashing problem may be to simply build myMKL_x64.DLL with a newer version of IVF&MKL. I've done this with Intel® Visual Fortran Compiler – extension version 19.1.0057.16, Package ID: w_comp_lib_2020.2.254, and sent the DLL to a user for testing.
I have IVF 2023.2 on another development system, but I need help figuring out how to build myMKL_x64.DLL on that system. I will use an earlier thread for that in the MKL forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am happy to report that my user with a Ryzen 9 7950X has reported that the crash was eliminated with the myMKL_x64.DLL built with IVF 19.1.
In addition, my user with a Ryzen 7 PRO 5875U has reported the same success.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »