My program is compiled with the latest revision of IVF 2015 Composer edition. It makes use of a number of mkl functions including CGEMM and ZGEMM. The code involving the mkl functions is unchanged and has worked flawlessly for many years through numerous versions of IVF and CVF before that.
But since compiling with IVF 2015 a problem has shown up. When calling CGEMM or ZGEMM, the program abruptly ends without any message. Compiled for debug, it also shows nothing. It only happens above a certain problem size, and the two users reporting the problem have AMD processors, one an FX8320 and the other an A10-4500M. My program is compiled as a 32 bit application so runs as such on either 32 or 64 bit systems. Both users with problems are using Windows 7.
I've updated my IVF 2015 to the latest revision, and out of curiosity I furnished one customer with mkl dlls from an earlier revision of 2015 and from 2013 SP1 to try. No help, although all were run with the basic program compiled with 2015 and its associated libraries.
Is there some compiler option I need to use in order to gain compatibility with these processors or to isolate the problem? I'm assuming that the problem is related to AMD processors since the two people out of several hundred successful users both have them -- but I don't know this for sure.
I'll be glad to furnish any other information that might be helpful in solving this.
I tried the CGEMMX.f example that is packaged with MKL using IFort 15.0.2 on the only AMD powered machine available to me -- an old (2006) desktop machine with an Athlon 64-X2 running Windows 10 Preview - 64 bit. The program ran fine.
You will have to provide more details to enable the problem to be seen. In particular, if you see the bug only for a certain problem size range, you have to tell us that range. In addition, what compiler options did you specify? What do you have in the file IFORT.cfg?
On the AMD machines that gave you problems, do you have enough memory to run the program? Have you tried running 64-bit versions of the program?
If you compiled the EXE on one machine and ran it on a different machine with a different CPU, if the latter does not support the instruction set needed by the EXE, the CPU dispatcher built into the EXE should give you a message before the program quits.
MKL does auto-CPU dispatch and would probably take a different code path on AMD systems due to Intel-specific optimizations. (See the Optimization Notice for more on this.) It should work fine and I am sure that the MKL developers would like to see a test case that fails. As an experiment you may want to play with the MKL options for binary reproducibility to see if that changes the behavior.
According your description, I've updated my IVF 2015 to the latest revision, and out of curiosity I furnished one customer with mkl dlls from an earlier revision of 2015 and from 2013 SP1 to try. It seems MKL dlls (from older version) won't influence the result. But anyway, you may try MKL Conditional Numerical Reproducibility functionality, please see https://software.intel.com/en-us/node/528411 & https://software.intel.com/en-us/node/528408
and let us know the result.
Is it possible for your customer to write the input &output of cGEMM on that AMD machine and build a standalone test case so we can investigate?
Intel MKL Support
I agree with Steve about looking into MKL conditional numerical reproducibility as a means for avoiding accidents with CPU dispatch (if you don't get a suggestion from an MKL expert). CNR would not be ideal if it doesn't use at least SSE3, but if it can avoid failure it's a useful data point. Note for Intel CPUs you can set MKL_CBWR=SSE3 or ...AVX2, but only
MKL_CBWR = COMPATIBLE is documented as compatible with AMD.
It seems MKL auto-dispatch may not be recognizing these recent AMD CPUs as supporting AVX or AVX2(fma3) correctly.
In your compiled code, choosing an appropriate single architecture without auto-dispatch (e.g. /arch:SSE3 ....) and setting
/Qimf-arch-consistency:true should help to avoid mistakes there. Just choose a single /arch setting according to the oldest CPU to be supported.
In many applications (depending on how much time is spent in MKL), only one of the compile or the MKL architecture settings may be critical for performance.
Consulting link advisor for MKL static link options might be useful, if you haven't tried that. Then only libiomp5.dll (if using MKL parallel) should be needed.
As you can see, there are so many possibilities related to your question that I must agree entirely with preceding comments about specifying what you are trying, both for architecture in your compilation and for your MKL link options.
Thank you very much for the suggestions and comments. I haven't had time to try them but did run an experiment that might help in the meantime.
On my machine I had my program record the data being sent to ZGEMM when the problem occurs on the user's machine. Then I wrote a simple program that reads the data from the file and just calls ZGEMM with the same data. Two files are written: a text file that records arriving at the points just before and just after the ZGEMM call, and a data file containing the output array from ZGEMM. I found I'm not authorized to upload anything here, so anyone who would like can get the source code, executable, and data file at http://eznec.com/misc/Gemm/. The larger of the two zip files includes the required run rime dlls. Visual Studio 2013 Visual C++ runtime has to be installed on the machine. It's a 32 bit debug compilation.
On my machine the program runs fine -- two lines in the output text file indicating that ZGEMM was called and returned, and the output data file was created. When the user runs it on the problem machine with AMD CPU, he sees this message:
Exp.exe has stopped working
A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available.
On two of three runs he saw an open but empty command window. On the other run, this message was in the window:
Forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
Nothing else -- no routine, line or source shown, just the column headers.
I've found that my program won't run at all on an old AMD Athlon machine under either XP (SP3) or Vista (SP2), and another user reported a persistent "access violation" message I wasn't able to solve. I suspect they're all related.
I hope that whatever cure is found won't seriously impact the substantial speed improvements I'm seeing from this version of IVF on processors on which it will run.
We haven't such machines like an FX8320 (SSE2 ) and an A10-4500M, but try to take a look if it is real bug.
And regarding the old OS, like XP or Vista
from the release notes =>system requirement , it claims that : ,
Microsoft Windows 7*, Microsoft Windows 8*, Microsoft Windows 8.1*, Microsoft
Windows Server 2012*, Microsoft Windows Server 2008 SP2* (IA-32 only), Microsoft
Windows Server 2008 (R2 SP1) or Microsoft Windows HPC Server 2008* (embedded
editions not supported)
and Windows XP is not supported. So they are not supported now.
I compiled and ran your program that you gave a link to in #6, using IFort 15.0.2 on the system described in #2 (Athlon 64 X2-4200+) with /Qmkl /arch:SSE3 /Qimf-arch-consistency:true. The program ran and produced the output file in about 0.05 second. I did not use your EXE for the usual reason that EXEs from unknown sources can be harmful.
I conclude that the suspicion that the program will fail on any AMD CPU is not justified. As Ying stated, XP is not a supported OS anymore, and it is unlikely that a reader of this forum will have a system with the specific AMD CPUs that you listed. Please select a system with those CPUs that is also running one of the supported versions of Windows (W7, W2008 SP2, etc.), compile using the suggested options for AMD CPUs, and try running your program on it.
I'm sorry I mentioned my suspicion that other problems might be related because it's completely distracted several people from the original problem. As I said in my original posting, both users experiencing the sudden program end when calling ZGEMM are running Windows 7 and when the example program is run on at least one of those systems, it fails as I described. Am I mistaken that IVF 2015 is alleged to support that operating system?
I have a couple of other questions and comments but see that I'd better keep it simple and concentrated on one thing at a time. Perhaps I should start over with another thread?