Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

QxHost and QxAVX __intel_avx_rep_memset broke

jimdempseyatthecove
Honored Contributor III
669 Views

Win7 Pro x64 PS Pro V16.0 update 1.

CPU E5-2620 v2

The compiler with updates was running quite well until recently (today). The host system has AVX (not AVX2). However, the runtime code linked in is using AVX2 instructions.

...
000007FEE29B98D6  call        __intel_avx_rep_memset (7FEE29CBF60h)  
...
__intel_avx_rep_memset:
000007FEE29CBF60  push        rdi  
000007FEE29CBF61  push        r15  
000007FEE29CBF63  mov         r11,rcx  
000007FEE29CBF66  mov         r10,r11  
000007FEE29CBF69  mov         rax,101010101010101h  
000007FEE29CBF73  movzx       r9,dl  
000007FEE29CBF77  imul        r9,rax  
000007FEE29CBF7B  lea         rdx,[7FEE29CCB80h]  
000007FEE29CBF82  vmovd       xmm0,r9  
000007FEE29CBF87  vpbroadcastd ymm0,xmm0  

The vpbroadcastd is an AVX2 instruction.

I am not sure what caused the symptom to occur, it had been building and running successfully before. The only thing I can think of is (after many builds and runs, both debug and release), I performed some release builds with targeted instruction sets. The last of which was /QxCORE-AVX2, which I know cannot run on my development system.

After this AVX2 targeted release build, I have now switched back to /QxHost, and then subsequently /QxAVX and both builds are now linking in __intel_avx_rep_memset... that uses an AVX2 instruction.

Is there a way to undo this behavior (IOW undocumented option to not use __intel_avx_rep_memset).
.OR. is there an updated library that corrects this bug?

Note, I am waiting for a License update in order to load/use the V16.0 update 2.

Jim Dempsey

 

0 Kudos
11 Replies
Steven_L_Intel1
Employee
669 Views

I'm looking into this....

0 Kudos
Steven_L_Intel1
Employee
669 Views

The routine would be linked in regardless, but there is CPU-dispatching code that would call it for appropriate instruction-set capable Intel processors. I looked at the source for this and it correctly dispatches to the "avx" routine only for AVX2-capable Intel processors (despite not saying AVX2 in the name.)  See later reply.

0 Kudos
Steven_L_Intel1
Employee
669 Views

Sorry for all the thrash on this, but I keep thinking about it and realizing I made errors in earlier replies.

How you build the program should have no effect on the run-time dispatch of _intel_fast_memset, which is internal to the run-time library. You haven't said exactly what goes wrong when it fails. As best as I can tell, when this routine is executed on an E5-2620 V2 (Ivy Bridge microarchitecture), the routine containing the AVX2 instruction would not be called (even though it will be linked in.)

0 Kudos
jimdempseyatthecove
Honored Contributor III
669 Views

I get an illegal instruction fault (the PC points to the vbroadcastd instruction)

 __intel_avx_rep_memset

is the name of the function. Note it is not named

 __intel_avx2_rep_memset

Jim

0 Kudos
Steven_L_Intel1
Employee
669 Views

I understand the name is confusing, but the code dispatches there only on AVX2 systems - or at least that's how I read the code. It is selected by "4th generation core", which has AVX2. Is this being called from _intel_fast_memset?

0 Kudos
jimdempseyatthecove
Honored Contributor III
669 Views

In the opening post, the call to  __intel_avx_rep_memset is made directly from the Fortran compiled source code. There is no call to the dispatcher __intel_fast_memset.

Recall I am compiling with /QxHost or /QxAVX. This is a targeted build for (first gen) AVX, and thus would (should) not call the dispatcher.

If I compile without  /QxHost or /QxAVX, meaning use the test and dispatcher routines, wouldn't the AVX2 version have a different name from the AVX version (either that or enforcing a .DLL load of a different library using the same named but different coded routines).

I will build and test a version using the dispatcher and see where it ends up. I would like to bypass the dispatcher (though I could specify 2 or 3 targets).

Jim Dempsey

 

0 Kudos
Steven_L_Intel1
Employee
669 Views

It's called directly from the compiled code? Really? One of the posts I deleted suggested that you shouldn't use /QxHost if you're going to run on a different system. I still believe that, but didn't think it relevant. It might be that the compiler saw that the host supported  AVX2 and used that, but I'm pretty sure that routine is not called directly from compiled code. Can you show me the .asm file with the call? I'll note that when the "avx" routine is entered it is not called, so the call stack may be misleading.

This dispatch is entirely within the run-time library, not in your compiled code. In this it's like the math library or MKL.

0 Kudos
jimdempseyatthecove
Honored Contributor III
669 Views

Steve,

The point of /QxAVX (or other targeted /Qx...) is to remove the dispatch code, and its overhead.

That aside, I believe I located the source of the problem. The wrong .dll was being loaded (IOW the AVX2 targeted was being loaded). An oversight on my part when too many library load paths are involved in a hybrid C# (managed), C++ dll, Fortran DLL is put together.

Thanks for your attention to my issue.

The odd part was I could step through the source code with the debugger, and step into the disassembly of the avx2 code.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
669 Views

As Steve pointed out, /QxAVX doesn't remove internal ISA dependent dispatch from math library or MKL.  Those are controlled by /Qimf-arch-consistency:true and  MKL conditional numerical reproducibility.  It gets difficult to remember all this, even in the absence of those additional factors Jim mentioned.  Even more of a problem than running into illegal instruction is the possibility sometimes encountered in the past of unexpected results on some platform which wasn't tested.  If you stepped into the runtime, the execution path presumably would depend on the platform you are running on.  

It's not evident whether there is any control over ISA dispatch for memset and the like, but those wouldn't have unexpected numerical effects.

0 Kudos
jimdempseyatthecove
Honored Contributor III
669 Views

Tim,

I get that. I am not using MKL. This call was made from

Array = 0.0

Which calls the appropriate __intel_..._ memset depending on compiler option. If none is specified, it will call the one with the dispatcher test. If a single one is specified (/QxAVX) then the dispatch code is omitted and the appropriate routine (entry point) is linked in.

The cause of the problem was the wrong DLL was loaded (my fault). The solution is quite screwy and does some goofy things. It is a mixed language solution where the Fortran library can be build with multiple (12 targeted) configurations, but the main C# has two configurations (Debug and Release). To resolve the association of what Fortran DLL gets used to which C# and C++ build, there are pre-build events and post-build events where the appropriate files (.lib, .dll and .pdb) are copied to the correct place. Unfortunately, if a specific project build isn't performed (its up to date), then the pre and/or post build events are not executed, and subsequently the desired .dll's do not get copied. The situation cannot be resolve through dependencies (without exploding the number of C# builds).

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
669 Views

This particular dispatch is not controllable with an option. I am curious though as to what actually called the routine. Maybe there was C++ code that did manual dispatch? I can't see a way that Fortran-compiled code would call this (though I could be mistaken.)

0 Kudos
Reply