I've a static lib bult with -xAVX. In runtime if CPU has no avx the lib is unused. However this lib presence forces my app to crash. In Link Map file I see ICC generates same mangled name for std calls, but code is different with/without.avx. For example for std::numeric_limits<float>::infinity() the name is __ZNSt14numeric_limitsIfE8infinityEv but it's built with avx instructions. So crash when it's called from non-avx code.
How to solve?
This is expected behavior of -xAVX option. You may refer to the compiler document for -x (https://software.intel.com/en-us/node/522845):
The specialized code generated by this option may only run on a subset of Intel® processors. The resulting executables created from these option code values can only be run on Intel® processors that support the indicated instruction set.
To resolve your issue, you need to use -mavx, which will generate both AVX and none-AVX code.
I can't check because I can't reproduce the situation, now it works ;-( Can you please explain more the difference between -xAVX and -mavx? If a library is built for AVX - it should use those instructions, it's normal/expected. But built-in calls with AVX should not be used out of the lib. Does -mavx provide this and how? I can't find in doc
-xAVX implies some checking for an Intel AVX capable CPU, presumably throwing an error to stderr if not passing. It you want a code path for CPUs which don't pass the test, you need -axAVX. -mavx doesn't involve any such checks.
Tim’s right: The option -axavx gives both SSE (uses default SSE) and AVX code paths and you can use –x or –m (/arch:) switches to modify the default SSE code path Fpr example: "–axavx –xsse4.2" to target both Nehalem and avx
BTW, the article https://software.intel.com/en-us/articles/how-to-compile-for-intel-avx/ should give you more details as well.
yes, Tim and Kittur are correct, you need to use -ax instead of -m/-x (sorry for my fault in my first reply. :( )
-xAVX: generate the instruction for Intel processor supporting AVX. The run-time will check the processor type and if it is not AVX processor, it will crash.
-mavx: similar to -x, but will also work for non-Intel processor which supports AVX.
-axavx: auto-dispatch at run time, generate a baseline code path and AVX code path. The baseline path is decided by -x or -m. By default, it is SSE2 (-m default is SSE2 and -x default is none, so...).
-axavx: run on processors supporting SSE2 above, and may optimize for AVX processor specifically.
-axavx -xsse4.2: run on Intel processors supporting SSE4.2 above and optimize for Intel AVX processors.
-axavx -msse4.2: run on any processors supporting SSE4.2 above and optimize for Intel AVX processors.
The compiler document has a more detailed introduction on these options. Hope it helps this time!
Just to add to the above discussion, When you want to dispatch the code on the machine which doesn't support AVX, then multiple code path generation method has to be enabled using the /Qax option as mentioned above. What compiler does when you do multiple code path generation is to check the CPUID on which architecture the code has been deployed on.
So when you see the sample asm being generated when you use /QxAVX,SSE4.2 , this would do an code generation something similar to if-else condition :-
.B1.3:: mov eax, DWORD PTR [__intel_cpu_feature_indicator] ;71.1 .B1.4:: add rsp, 8 jmp main.R .B1.6:: ; Preds .B1.3 test BYTE PTR [__intel_cpu_feature_indicator], 1 je .B1.8 ; Prob 10% .B1.7:: add rsp, 8 jmp main.A
You may see from the above asm code (I have stripped the unnecessary part) it has 2 paths for the same main ( either main.R or main.A is been picked based on the value of "__intel_cpu_feature_indicator".
Hope this gives some idea.
Sukruth H V
Thx for your replies. When I replaced one dylib with static one, the problem appeared again. I've tried -mavx, no difference, same crash in runtime. With -axAVX I've got a lot of unresolved(s). I printed macro and see that __AVX__ is not more defined. Adding it to prerocessors definitions I've got the compile error:
error: identifier "__popcnt" is undefined
Please note that I don't need to generate "auto-detect" code, the library does it explicit (at least it should). Any idea?