Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Problem running compiled console application on AMD Ryzen processor

vaneev__aleksey
Beginner
2,584 Views

I'm using Intel C++ Compiler 2019.4 to produce a console application, I use the following compilation switches: /O3 /Qunroll /Qunroll-aggressive /QxSSE3 /QaxCORE-AVX2.

The problem is the application exits with an error that SSE3 feature is unavailable, on AMD Ryzen with Zen architecture. Application works on Intel i7-7700K processor. Instruction sets of these processors are basically similar, yet application does not work on Ryzen processor. It seems that "libirc"'s handler somehow incorrectly assumes Ryzen processor's features.

At the same time, when I build a DLL with the same compilation switches, the DLL loads correctly on Ryzen processor.

Looking for a possible solution to this very unfortunate problem making me consider a switch to LLVM.

0 Kudos
11 Replies
Varsha_M_Intel
Employee
2,584 Views

Hi, Could you please share a sample code with us with demonstrates the issue?

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,583 Views

Can you use /QxSSE2 /QaxCORE-AVX2 ?

If so, what is the performance difference on CPUs that support SSE2 (but not AVX2)?

Jim Dempsey

0 Kudos
vaneev__aleksey
Beginner
2,584 Views

The simple program:

#include <stdio.h> int main() { printf( "Hello, world!\n" ); }

does not work. The displayed console error is

"Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2 and SSE3 instructions."

If I remove /QaxCORE-AVX2 the program works, so there is some error in Intel run-time library related to auto-dispatch code.

0 Kudos
vaneev__aleksey
Beginner
2,584 Views

/QxSSE2 performance on Ryzen processor is pretty good on my floating-processing heavy applications. I've compared Ryzen 3 1200 with Intel i7-7700K, and Ryzen is only 20% slower at 35% lower clock speed (3.1GHz vs 4.2GHz).

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,584 Views

Does /QxSSE2 /QaxCORE-AVX2 work on the AMD system?

IOW is the error related to SSE3 or the CPU identification?

From https://techreport.com/review/8327/amds-athlon-64-3800-venice-processor/

Support for SSE3 instructions — The Venice core can execute 11 of the 13 instructions that make up SSE3....The two SSE3 instructions that the Venice core doesn’t support have to do with thread synchronization for Hyper-Threading...

Note, the above is from a 2005 article, so I do not know if those two instructions are missing from the current AMD CPUs

Jim Dempsey

0 Kudos
vaneev__aleksey
Beginner
2,584 Views

/QxSSE2 /QaxCORE-AVX2 does not work either, I think it's some CPU features detection error. I'll reiterate that when DLL is compiled with these switches, it does load and execute correctly, no "invalid instruction" errors are generated.

0 Kudos
McCalpinJohn
Honored Contributor III
2,584 Views

If some of the SSE3 instructions are not supported on Ryzen, then the Intel compiler is making the correct, conservative, judgement.

If I recall correctly, compiling with the "-ffreestanding" (/Qfreestanding) option will eliminate the feature tests at program startup.  The program should run correctly unless you actually execute an unsupported instruction....

Although the compiler manual says that the "-x{code}" (/Qx{code}) options will run on compatible processors that support the same features, they have a negative incentive to make this work well.   The "-x{code}" (/Qx{code}) options specify both an instruction set and a tuning target, so they may result in sub-optimal code even if the code works correctly.  

Intel also "bundles" the instruction sets in a way that may make it difficult to get the code you want to run on an AMD processor.  For example, the -axCORE-AVX2 option may generate instructions using AVX2, AVX, SSE4.2, SSE4.1, SSE3, SSE2, SSE, and SSSE3 "for Intel processors".   There is no guarantee that the runtime test(s) will check for each of these independently -- a single missing instruction in any of these instruction sets might be enough to block any alternate code path.  (This was certainly an issue with AMD processors that did not support SSSE3, but I don't know if it applies to current AMD processors....)

0 Kudos
vaneev__aleksey
Beginner
2,584 Views

Well, in a long run these "quirks" will make Intel C++ compiler irrelevant for commercial deployment. /Qfreestanding does work, but it produces a far inferior code to e.g. LLVM.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,584 Views

What you may be able to do is to generate two .DLL's. One for each instruction set. Then configure your application to NOT auto load your DLL. Instead, add a CPU identifier that you write that identifies the CPU (AMD or Intel). Then specifically load the DLL you want.

The alternative is to write a single DLL that contains your CPU identifier. And add library dispatch objects with the appropriate vtable dispatch for each CPU. One object for AMD and one for Intel. Place the pointer to the appropriate object in a global location.

Jim Dempsey 

0 Kudos
vaneev__aleksey
Beginner
2,584 Views

I do not use DLLs for my console applications, so there is no option to use DLL versioning for me.

I also commercially ship plugin DLL software, and it is working on Ryzen processors with the same compilation swiches.

So, it's basically a bug in Intel run-time library with console applications not working on Ryzen processors.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,584 Views

>>So, it's basically a bug in Intel run-time library with console applications not working on Ryzen processors.

Or anti-competitive feature.

>>I do not use DLLs for my console applications

Well, you could wait until Intel fixes it, or replace your main with a stub that contains your CPU identifier, and then calls either main_AMD(argc, argv) or main_Intel(argc, argv). Each .obj file (.lib) compiled with only /Qx... IOW omit the /Qax...

So your code is larger, so what. It avoids the .DLL and it is likely smaller than app+ two DLLs.

Note, in lieu of segregating at main, you could segregate only the very compute intensive portions of the application. QED

Jim Dempsey

0 Kudos
Reply