We have a customer who bought Alienware desktops with i7-8700K processors and reported that our (console) application is randomly crashing on them - the console window disappears. Alienware provides a tool called Support Assist to check system event and our application's crashes are shown there with some details, e.g. fault bucket number and type, WER (Windows Error Reporting) data.
My main question is whether there are any compiler or linker settings in IVF that might cause application crashes on computers with certain CPUs.
We are using IPS XE 2016, Update 4, and IPS XE 2017, Update 2, on Windows 7 Professional 64-bit, with following compiler and linker options for our application:
/nologo /MP /O2 /fpp /I"%MKLROOT%" /I"\include" /fixed /extend_source:132 /Qopenmp /fpscomp:general /warn:declarations /warn:unused /warn:truncated_source /warn:noalignments /warn:interfaces /assume:byterecl /module:"x64\\" /object:"x64\\" /Fd"x64\vc140.pdb" /libs:static /threads /Qmkl:parallel /c /Qm64
/OUT:"x64\MyApp.exe" /INCREMENTAL:NO /NOLOGO /DELAYLOAD:"MyAppDll.dll" /MANIFEST /MANIFESTFILE:"x64\MyApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"D:\Tests\x64\MyApp.lib" delayimp.lib ..\DLL\x64\MyAppDll.lib mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib
I know that crashes can be caused by a bug in the application, so my other question is whether it makes sense to create a "special" version of our application by adding /traceback to the compiler options. Would this show the location in the source code at the time of the crash?
I am doubtful that the CPU type is the trigger. The event log should have some more information on the error that occurred and the "faulting module", though the latter may not tell you much. If the program is run from a separately-started console window, you'll get to see any error messages - if the application is just run directly you'll miss those when the console window goes away.
My guess is that there's some other aspect of the laptops that is contributing, but getting the event log info is a start.
That CPU has 6 cores and 12 threads. Do the working CPUs have fewer (.lt. 12) logical CPU? If this is the case, you may have a problem of lack of resources. You may be able to provide a work around by setting the environment variables to reduce the number of threads for use by MKL
The above would specify 6 threads, one per core.
Thank you for your kind responses. I agree with Steve that it is not likely that CPU would cause this behavior. Also, we have never had any problem with lack of resources. Our application can use MKL, but the crashing configuration does not use MKL, so MKL is not the cause of the problem. Neither is parallelization; the application is crashing even when it is set to use just one CPU core.
We created a special version of the Release configuration, with traceback enabled. We sent this to the customer and asked them to run it from a command line window. The error message was
forrtl: The requested operation cannot be performed on a file with a user-mapped section open.
forrtl: severe (38): error during write, unit 12, file filename.txt
I found out that I was dealing with the same problem 10 years ago in this forum: https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/300605
At that time, I found out that Comodo Firewall's component Defense+ was causing the very same problem. Disabling this feature made everything work well again. One reply in the topic mentioned above was that the same problem occurred on a computer with TrendMicro security software. And, guess what, our customer is using TrendMicro. We asked to exclude the folder with calculation results from the list of folder scanned by TrendMicro's Deep Security feature to see whether this will help.
What is still puzzling me is that all of our customer's computers have TrendMicro installed, but only the ones with a certain processor (i7-8700K) are causing problems. Could this be related to the way how files are saved to disk?
I think this may be a case of timing. This is to say you were lucky in the working situations.
The error only occurs with concurrent access to the file in question.
CPU/system x ThrendMicro then Your program
CPU/system y Your program then TrendMicro
CPU i7-8700K/system Your program simultaneous with TrendMicro
Timing would also be affected by disk/ssd, RAM, file cacheing, etc..., and not just CPU model.
The message: a file with a user-mapped section open
seems to indicate filename.txt is opened by one application (TrendMicro) as a memory mapped file. The corrective measure is to have your output folder(s) and/or output file types marked as exclusions.
FWIW on my Windows development systems, I must make my Solution and Project folders (and sub-folders) excluded in order to prevent some weird MS Visual Studio errors.