Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28618 Discussions

eigensolver code crashes on AMD processor

Brian_Murphy
New Contributor II
6,792 Views

I am using pardiso with ARPACK's Arnoldi eigensolver.  The code has been in use by over 100 users for several years.  I'm getting reports that the code crashes on AMD Ryzen systems.  Is there anything in particular that might be causing this?  I sent a DEBUG build to a user, but this didn't reveal anything as it simply crashed in identical fashion with no messages.

I'm using visual studio 2019.  My ifort compiler command line is as follows:

/nologo
/O2
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\LAPACK\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\Umfpack\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\BLAS\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\UTIL\x64\Debug"
/I"C:\Users\Me\Documents\Visual Studio 2019\Projects\Xlrotor\ARPACK\SRC\x64\Debug"
/extend_source:132
/module:"x64\Release\\"
/object:"x64\Release\\"
/Fd"x64\Release\\vc160.pdb"
/libs:static
/threads
/c

I've read about a compiler option /Qimf-arch-consistency in this thread.  If I try this option, should I set it to true or false?

Thanks,

Brian Murphy

0 Kudos
1 Solution
Brian_Murphy
New Contributor II
5,432 Views

I am happy to report that my user with a Ryzen 9 7950X has reported that the crash was eliminated with the myMKL_x64.DLL built with IVF 19.1.

In addition, my user with a Ryzen 7 PRO 5875U has reported the same success.

View solution in original post

50 Replies
Barbara_P_Intel
Employee
2,131 Views

@jimdempseyatthecove, historically, no.

MS doesn't make older versions available for the Community. Similar to Intel's policy of only having the latest compiler available for free.

We're looking into a solution (fingers-crossed). 

0 Kudos
Brian_Murphy
New Contributor II
2,099 Views

I just received a brand new HP laptop with Ryzen 5 processor and 64 bit Office installed.  The Office version is exactly the same version as for the other computers that crash.  However, the crash does not happen with this computer!!!  This totally ruins my plans to find the bug.

 

0 Kudos
Brian_Murphy
New Contributor II
2,089 Views

Ryzen processors that crash are

  1. AMD Ryzen 9 7950X 16-Core Processor, 4501 Mhz, 16 Core(s), 32 Logical Processor(s)
  2. AMD Ryzen 7 PRO 5875U with Radeon Graphics, 2000 Mhz, 8 cores, 16 logical processors
  3. AMD Ryzen 5 5625U with Radeon Graphics, 2301 Mhz, 6-Core Processor

The Ryzen processor that does not crash is:

AMD Ryzen 5 5300U with Radeon Vega Mobile Gfx, 2100 Mhz, 4 Core(s), 8 Logical Processor(s)

Is anyone aware of differences between these processors that could explain why the 5300U does not crash.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,072 Views

In your Release version, add the option /traceback to

  Configuration Properties | Fortran | Command Line

Set

  Configuration Properties | Fortran | Debugging | Debug Information Format | Full   ( or add /debug:full to Fortran Command Line)

  Configuration Properties | Linker or Librarian | Debugging | Generate Debug Info | Yes   (or add /DEBUG to Linker/Librarian Command Line)

 

Then send the coded to a customer that has a failing system.

Have him make a run to produce the error, and then have the customer send back the contents of the CMD window.

 

   YourProgram Their args > LogFile.txt

 

The traceback should identify the source file and line number of the error.

 

Note, test the build on your system, you may need to force an error (then remove the error before sending to user).

 

If you can identify the source file, then, from the Project in the Solution Explorer, in Release Build, Right-Click on the offending file, select Properties | Configuration Properties | Fortran | Optimization | Disable

This will disable optimization for this file only. You will also see a colored tick mark on the source file in the solution explorer indicating the build options differ from the default for this Build.

 

Jim Dempsey

0 Kudos
Brian_Murphy
New Contributor II
2,052 Views

I have created a console program to run a case that crashes. 

The attached file doit.zip contains just the files needed to run the case to see if it runs or crashes.  Run the batch file doit.bat and if any errors occur, messages should be displayed.  Otherwise the correct output from the run is the following pair of complex eigenvalues.  The values produced by Intel and AMD cpu's are slightly different.

Intel
        -0.117298699448859             -215.875253944077
        -0.117298699448859              215.875253944077
AMD
        -0.117298729052565             -215.875254020056    
        -0.117298729052565              215.875254020056    

 The attached file RyzenTest.zip contains a build done on a brand new HP Ryzen 5 computer with visual studio 2022 and Intel Fortran Classic, both having been downloaded and installed today. 

The visual studio solution file is RyzenTest.sln

The main program project is RyzenTest.vfproj

There are four ARPACK projects required by the program; SRC, UTIL, BLAS and LAPACK.  These are included in the building of RyzenTest.exe via the project dependencies dialog.

My HP computer with Ryzen 5 5300U processor does not crash.  If anyone runs this and gets a crash, please report what the error messages are, if any. 

The MKL pardiso code used by the program is in myMKL_x64.DLL which I created by extracting just the MKL routines needed by my program.  It ought to be possible to remove this from the project (Properties/Linker/Input/Additional dependencies) and replace it with the full MKL code that ships with the compiler (Properties/Fortran/Libraries/Use Intel MKL).  I tried this on my Ryzen system and was told that mkl_intel_lp64.lib was missing.  I guess it sort of makes sense that MKL and AMD don't play nicely with each other.

0 Kudos
mecej4
Honored Contributor III
2,026 Views

Your EXE+DLL runs normally on my Ryzen 7 4800U and the output is fine.

 

Brian wrote: "I tried this on my Ryzen system and was told that mkl_intel_lp64.lib was missing. "

Verify that mkl_intel_lp64.lib is accessible through the LIB environment variable.

 

I have attached below a zip file containing three Fortran source files -- about 7000 lines of code, no dependence on libraries other than those provided with Intel Fortran and MKL. Compile, link and run using the commands

 

ifort /Qmkl RyzenTest.f90 darpack.f90 pardiso.f90
RyzenTest

 

Note that the data file eigs-pardiso.bin must be present in the working directory in order to run the program. The output from this program is:

        -0.117298702224452             -215.875253982604
        -0.117298702224452              215.875253982604

 

0 Kudos
Steve_Lionel
Honored Contributor III
2,038 Views

That the results are slightly different is expected - the math library takes different paths for Intel and non-Intel processors. /Qimf-arch-consistency should remove that difference. I have not yet seen the text of the error message. It would be more helpful to know the exact instruction where it fails.

0 Kudos
Brian_Murphy
New Contributor II
1,973 Views

Thanks for running the test, mecej4. 

On my regular development system (with intel cpu), I was able to run the ifort command line to compile your Ryzentest program, and it built and ran with no trouble.   What exact vintage of ARPACK did you use?

On my new computer with Ryzen cpu, it would not build, saying that mkl_intel_lp64.lib was missing.  I am quite certain the Intel Fortran compiler installed on this system (2023.2.1) did not install any file with MKL in the file name.  When installing Intel Fortran, is it necessary to tell the installer to install MKL support?

I did a screenshare to a crashing computer with Ryzen 5 5625U processor, and ran my version of RyzenTest.exe.  It crashed and gave the attached error message.  Line 508 is a call to pardiso with phase=22 to perform factorization of the input matrix.  Just before this call, the call of pardiso with phase=11 evidently succeeded (i.e. analysis phase).

So it appears I've encountered a mysterious bug in the MKL pardiso code.  The vintage of this pardiso code is Intel Fortran 13.0.  I use 13.0 because I encountered other issues with 19.1, and so reverted to 13.0 as the "last known good" version of MKL.  IVF 19.1 is on my primary development system.

20230827-13.14 - 13.14.57.jpg

0 Kudos
JohnNichols
Valued Contributor III
1,968 Views

If you are running 32 bit then there is a separate module for download in the basekit tab for MKL. You need to install that as well, I had the same problem.  

The base kit is now to big for the entire kit. 

0 Kudos
mecej4
Honored Contributor III
1,930 Views

With OneAPI, installation of the Fortran compiler does not get you MKL. Thus, on your new Ryzen computer, you do not have MKL installed.  You have to download and run another installer: either the OneAPI Base Toolkit, or the OneMKL installer.

I infer from the various posts from you in this thread that your runs on  various Ryzen CPUs were made using your custom DLL rather than a recent version of MKL. Your custom DLL was built using a ten-year old version of MKL. Furthermore, you did not use the Lapack and BLAS routines that are in MKL; instead, you used Fortran source files for subsets of Lapack and BLAS (and Arpack).

You stated your version of MKL as 13.0, but that is incorrect. I have Parallel Studio 2013SP1, which included MKL; the Fortran compiler version is 14.0.4.237, and the MKL version is 11.1.4. What you have is older than that (I guess the compiler and MKL that you used to build your custom DLL are from 2012). Ryzens were born in 2016-2017.

Re "What exact vintage of ARPACK did you use?": I used a subset of the Ng-Peyton version, namely, the routines in the file darpack.f90 that I provided to you. I simply wrapped the selected routines that your application needs into a Fortran module. Alternatively, you can use the Arpack sources that you have in your originally posted file RyzenTest.zip.

0 Kudos
Brian_Murphy
New Contributor II
1,968 Views

So how do I add MKL support?  The Visual Studio I have installed says it is a 64 bit version, and I am attempting to build a 64 bit console app.

0 Kudos
Brian_Murphy
New Contributor II
1,958 Views

The problem I had with MKL 19.1 is the Arnoldi eigensolver behaved differently than with MKL 13.0.  With 13.0, Arnoldi returned identical eigenvalues regardless of whether or not eigenvectors were also being returned.  19.1 did not behave this way, which for other reasons created big problems for me.  So I took the easy way around that problem.

If I can get Intel Fortran 2023.2 working, I will test if Arnoldi still has this issue with today's MKL.

0 Kudos
Steve_Lionel
Honored Contributor III
1,951 Views

Ah, an illegal instruction fault! That helps a lot. The older Ryzen doesn't support an instruction in the executable. In your test ZIP I don't see that you enabled any specific instruction set. It could be that MKL is making an improper assumption here. I suggest you take this to the MKL forum.

0 Kudos
mecej4
Honored Contributor III
1,938 Views

Steve, the illegal instruction fault occurred in Brian's custom DLL, myMKL_x64.dll, rather than in the executable, RyzenTest.exe. The custom DLL does not appear to support showing routine names (let alone line numbers) in the traceback.

0 Kudos
Steve_Lionel
Honored Contributor III
1,923 Views

@mecej4 wrote:

Steve, the illegal instruction fault occurred in Brian's custom DLL, myMKL_x64.dll, rather than in the executable


It was my understanding that he had statically linked MKL into his DLL. Perhaps I was mistaken. I doubt this is Fortran-generated code.

0 Kudos
mecej4
Honored Contributor III
1,909 Views

I think that he used only Pardiso from MKL. Everything else (including Arpack, Lapack, BLAS) was computed from Fortran source files. That is why his Zip files as well as his custom DLL file are tens of megabytes in length.

0 Kudos
Steve_Lionel
Honored Contributor III
1,903 Views

He said, "Line 508 is a call to pardiso with phase=22 to perform factorization of the input matrix. " Doesn't that imply that that the error occurs within Pardiso?

0 Kudos
mecej4
Honored Contributor III
1,878 Views

I presume that the MKL subroutine Pardiso calls scores of Lapack and BLAS routines -- there are over 40 symbols in mkl_core.lib that contain "phase_22", and many of them may in turn call Lapack and BLAS routines. Because of the way the custom DLL was built, these Lapack and BLAS routines may not be the MKL versions, but those compiled by the user from Fortran sources, and the access violation illegal instruction may have occurred been encountered in one of these routines.

The situation is quite complicated, and a shorter reproducer would help. It may be worthwhile to build a debug version of the custom DLL.

0 Kudos
Brian_Murphy
New Contributor II
1,839 Views

I will be trying to test my program with today's MKL to see if ARPACK will work the way I need it to.  If that fails, I will stick with the old MKL and tell my users not to use computers with AMD chips.  If that succeeds, I still don't know if the new code will crash on AMD chips.

mecej4 - Regarding the exact version of Intel Fortran I used to create myMKL_64.dll.  From inside visual studio 2010, Help/About shows it to be...

Intel(R) Visual Fortran Composer XE 2013 Update 5 Integration for Microsoft Visual Studio* 2010, 13.0.3636.2010, Copyright (C) 2002-2013 Intel Corporation

0 Kudos
mecej4
Honored Contributor III
1,828 Views

According to the table about 2/3 of the way down on this page, the MKL version that accompanied Intel Fortran 2013 Update 5 was MKL 11.0.5. Compiling and running the following C program will enable you to identify the exact MKL version of your installation.

 

#include <stdio.h>
#include <stdlib.h>
#include "mkl_service.h"

int main(void)  {
    MKLVersion Version;
    MKL_Get_Version(&Version);
    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");
    return 0;
}

 

0 Kudos
Reply