- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using Intel C++ Compiler 2019.4 to produce a console application, I use the following compilation switches: /O3 /Qunroll /Qunroll-aggressive /QxSSE3 /QaxCORE-AVX2.
The problem is the application exits with an error that SSE3 feature is unavailable, on AMD Ryzen with Zen architecture. Application works on Intel i7-7700K processor. Instruction sets of these processors are basically similar, yet application does not work on Ryzen processor. It seems that "libirc"'s handler somehow incorrectly assumes Ryzen processor's features.
At the same time, when I build a DLL with the same compilation switches, the DLL loads correctly on Ryzen processor.
Looking for a possible solution to this very unfortunate problem making me consider a switch to LLVM.
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Could you please share a sample code with us with demonstrates the issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you use /QxSSE2 /QaxCORE-AVX2 ?
If so, what is the performance difference on CPUs that support SSE2 (but not AVX2)?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The simple program:
#include <stdio.h> int main() { printf( "Hello, world!\n" ); }
does not work. The displayed console error is
"Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2 and SSE3 instructions."
If I remove /QaxCORE-AVX2 the program works, so there is some error in Intel run-time library related to auto-dispatch code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/QxSSE2 performance on Ryzen processor is pretty good on my floating-processing heavy applications. I've compared Ryzen 3 1200 with Intel i7-7700K, and Ryzen is only 20% slower at 35% lower clock speed (3.1GHz vs 4.2GHz).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does /QxSSE2 /QaxCORE-AVX2 work on the AMD system?
IOW is the error related to SSE3 or the CPU identification?
From https://techreport.com/review/8327/amds-athlon-64-3800-venice-processor/
Support for SSE3 instructions — The Venice core can execute 11 of the 13 instructions that make up SSE3....The two SSE3 instructions that the Venice core doesn’t support have to do with thread synchronization for Hyper-Threading...
Note, the above is from a 2005 article, so I do not know if those two instructions are missing from the current AMD CPUs
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/QxSSE2 /QaxCORE-AVX2 does not work either, I think it's some CPU features detection error. I'll reiterate that when DLL is compiled with these switches, it does load and execute correctly, no "invalid instruction" errors are generated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If some of the SSE3 instructions are not supported on Ryzen, then the Intel compiler is making the correct, conservative, judgement.
If I recall correctly, compiling with the "-ffreestanding" (/Qfreestanding) option will eliminate the feature tests at program startup. The program should run correctly unless you actually execute an unsupported instruction....
Although the compiler manual says that the "-x{code}" (/Qx{code}) options will run on compatible processors that support the same features, they have a negative incentive to make this work well. The "-x{code}" (/Qx{code}) options specify both an instruction set and a tuning target, so they may result in sub-optimal code even if the code works correctly.
Intel also "bundles" the instruction sets in a way that may make it difficult to get the code you want to run on an AMD processor. For example, the -axCORE-AVX2 option may generate instructions using AVX2, AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 "for Intel processors". There is no guarantee that the runtime test(s) will check for each of these independently -- a single missing instruction in any of these instruction sets might be enough to block any alternate code path. (This was certainly an issue with AMD processors that did not support SSSE3, but I don't know if it applies to current AMD processors....)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, in a long run these "quirks" will make Intel C++ compiler irrelevant for commercial deployment. /Qfreestanding does work, but it produces a far inferior code to e.g. LLVM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What you may be able to do is to generate two .DLL's. One for each instruction set. Then configure your application to NOT auto load your DLL. Instead, add a CPU identifier that you write that identifies the CPU (AMD or Intel). Then specifically load the DLL you want.
The alternative is to write a single DLL that contains your CPU identifier. And add library dispatch objects with the appropriate vtable dispatch for each CPU. One object for AMD and one for Intel. Place the pointer to the appropriate object in a global location.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not use DLLs for my console applications, so there is no option to use DLL versioning for me.
I also commercially ship plugin DLL software, and it is working on Ryzen processors with the same compilation swiches.
So, it's basically a bug in Intel run-time library with console applications not working on Ryzen processors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>So, it's basically a bug in Intel run-time library with console applications not working on Ryzen processors.
Or anti-competitive feature.
>>I do not use DLLs for my console applications
Well, you could wait until Intel fixes it, or replace your main with a stub that contains your CPU identifier, and then calls either main_AMD(argc, argv) or main_Intel(argc, argv). Each .obj file (.lib) compiled with only /Qx... IOW omit the /Qax...
So your code is larger, so what. It avoids the .DLL and it is likely smaller than app+ two DLLs.
Note, in lieu of segregating at main, you could segregate only the very compute intensive portions of the application. QED
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page