Win7 Pro x64 PS Pro V16.0 update 1.
CPU E5-2620 v2
The compiler with updates was running quite well until recently (today). The host system has AVX (not AVX2). However, the runtime code linked in is using AVX2 instructions.
... 000007FEE29B98D6 call __intel_avx_rep_memset (7FEE29CBF60h) ... __intel_avx_rep_memset: 000007FEE29CBF60 push rdi 000007FEE29CBF61 push r15 000007FEE29CBF63 mov r11,rcx 000007FEE29CBF66 mov r10,r11 000007FEE29CBF69 mov rax,101010101010101h 000007FEE29CBF73 movzx r9,dl 000007FEE29CBF77 imul r9,rax 000007FEE29CBF7B lea rdx,[7FEE29CCB80h] 000007FEE29CBF82 vmovd xmm0,r9 000007FEE29CBF87 vpbroadcastd ymm0,xmm0
The vpbroadcastd is an AVX2 instruction.
I am not sure what caused the symptom to occur, it had been building and running successfully before. The only thing I can think of is (after many builds and runs, both debug and release), I performed some release builds with targeted instruction sets. The last of which was /QxCORE-AVX2, which I know cannot run on my development system.
After this AVX2 targeted release build, I have now switched back to /QxHost, and then subsequently /QxAVX and both builds are now linking in __intel_avx_rep_memset... that uses an AVX2 instruction.
Is there a way to undo this behavior (IOW undocumented option to not use __intel_avx_rep_memset).
.OR. is there an updated library that corrects this bug?
Note, I am waiting for a License update in order to load/use the V16.0 update 2.
The routine would be linked in regardless, but there is CPU-dispatching code that would call it for appropriate instruction-set capable Intel processors. I looked at the source for this and it correctly dispatches to the "avx" routine only for AVX2-capable Intel processors (despite not saying AVX2 in the name.) See later reply.
Sorry for all the thrash on this, but I keep thinking about it and realizing I made errors in earlier replies.
How you build the program should have no effect on the run-time dispatch of _intel_fast_memset, which is internal to the run-time library. You haven't said exactly what goes wrong when it fails. As best as I can tell, when this routine is executed on an E5-2620 V2 (Ivy Bridge microarchitecture), the routine containing the AVX2 instruction would not be called (even though it will be linked in.)
I get an illegal instruction fault (the PC points to the vbroadcastd instruction)
is the name of the function. Note it is not named
I understand the name is confusing, but the code dispatches there only on AVX2 systems - or at least that's how I read the code. It is selected by "4th generation core", which has AVX2. Is this being called from _intel_fast_memset?
In the opening post, the call to __intel_avx_rep_memset is made directly from the Fortran compiled source code. There is no call to the dispatcher __intel_fast_memset.
Recall I am compiling with /QxHost or /QxAVX. This is a targeted build for (first gen) AVX, and thus would (should) not call the dispatcher.
If I compile without /QxHost or /QxAVX, meaning use the test and dispatcher routines, wouldn't the AVX2 version have a different name from the AVX version (either that or enforcing a .DLL load of a different library using the same named but different coded routines).
I will build and test a version using the dispatcher and see where it ends up. I would like to bypass the dispatcher (though I could specify 2 or 3 targets).
It's called directly from the compiled code? Really? One of the posts I deleted suggested that you shouldn't use /QxHost if you're going to run on a different system. I still believe that, but didn't think it relevant. It might be that the compiler saw that the host supported AVX2 and used that, but I'm pretty sure that routine is not called directly from compiled code. Can you show me the .asm file with the call? I'll note that when the "avx" routine is entered it is not called, so the call stack may be misleading.
This dispatch is entirely within the run-time library, not in your compiled code. In this it's like the math library or MKL.
The point of /QxAVX (or other targeted /Qx...) is to remove the dispatch code, and its overhead.
That aside, I believe I located the source of the problem. The wrong .dll was being loaded (IOW the AVX2 targeted was being loaded). An oversight on my part when too many library load paths are involved in a hybrid C# (managed), C++ dll, Fortran DLL is put together.
Thanks for your attention to my issue.
The odd part was I could step through the source code with the debugger, and step into the disassembly of the avx2 code.
As Steve pointed out, /QxAVX doesn't remove internal ISA dependent dispatch from math library or MKL. Those are controlled by /Qimf-arch-consistency:true and MKL conditional numerical reproducibility. It gets difficult to remember all this, even in the absence of those additional factors Jim mentioned. Even more of a problem than running into illegal instruction is the possibility sometimes encountered in the past of unexpected results on some platform which wasn't tested. If you stepped into the runtime, the execution path presumably would depend on the platform you are running on.
It's not evident whether there is any control over ISA dispatch for memset and the like, but those wouldn't have unexpected numerical effects.
I get that. I am not using MKL. This call was made from
Array = 0.0
Which calls the appropriate __intel_..._ memset depending on compiler option. If none is specified, it will call the one with the dispatcher test. If a single one is specified (/QxAVX) then the dispatch code is omitted and the appropriate routine (entry point) is linked in.
The cause of the problem was the wrong DLL was loaded (my fault). The solution is quite screwy and does some goofy things. It is a mixed language solution where the Fortran library can be build with multiple (12 targeted) configurations, but the main C# has two configurations (Debug and Release). To resolve the association of what Fortran DLL gets used to which C# and C++ build, there are pre-build events and post-build events where the appropriate files (.lib, .dll and .pdb) are copied to the correct place. Unfortunately, if a specific project build isn't performed (its up to date), then the pre and/or post build events are not executed, and subsequently the desired .dll's do not get copied. The situation cannot be resolve through dependencies (without exploding the number of C# builds).
This particular dispatch is not controllable with an option. I am curious though as to what actually called the routine. Maybe there was C++ code that did manual dispatch? I can't see a way that Fortran-compiled code would call this (though I could be mistaken.)