>>What I'm looking for is

Royi · ‎02-18-2018

Hello,

Intel Compiler 18.1 allows creating code optimized according to features (SSE / AVX / AVX2 / AVX512 / etc..).
I was wondering, are those code path for Intel CPU's only or based on CPU Features?

Moreover, do they take OS in consideration (For example, AVX2 is supported only on Windows 7 SP1 and above)?

Thank You.

Viet_H_Intel · ‎02-22-2018

Hi,

It depends on what options you use. For examle, if you use -xAVX then it works only on Intel Processors, but if you use -mavx then it will work on any CPU that supports AVX instructions.

I don't think it depends on OS though.

Regards,

Viet

Royi · ‎02-22-2018

OK,

So I can make the base optimized without discrimination yet if I add optimized code path it is Intel only?

In Windows, if I use /arch:AVX (Which I think matches -mavx) it will generate base code path which is AVX an work on any computer.
The /Qa will add a code path which only available to Intel CPU's.

What about /Qx, does it add a third code path?
Namely, If I go /arch:SSE3 and /QaAVX and QxAVX2 will I have 3 code paths?
2 paths (AVX + AVX2) only for Intel CPU's and SSE3 path for any SSE3 CPU?

Thank You.

TimP · ‎02-23-2018

I think you want /arch:SSE3 /QaxAVX /QaxAVX2. The advantage of asking for both AVX and AVX2 paths may be slim, as there is some overhead in adding paths (both the time taken repeatedly to choose paths and the code size expansion). The compiler should try to evaluate when adding an AVX2 path has a chance of improving performance. As noted above, the options with x in them involve a check on whether it is Intel CPU, as well as a check whether the instruction set is available.

There may be some code where AMD processors may run better with SSE3 or AVX128 than with Intel optimized AVX256. Note that AVX2 not only uses instructions available only in AVX2, it also uses 256-bit loads and stores in positions where the data can't be aligned. I don't know whether it will make both 128-bit and 256-bit load/store paths when you ask for it this way. Sandy Bridge had a big penalty for 256-bit access to misaligned data (AVX target code is optimized for that); Ivy Bridge had a much smaller penalty, and AVX2 processors could see an advantage for 256-bit access on both aligned and unaligned data. Intel compilers use unaligned instructions even after inserting code to adjust alignment, as recent CPUs don't need aligned instructions for best performance.

Royi · ‎02-23-2018

Hi Tim,

I wasn't aware I could use /Qx<> multiple time.
So I can create as many code paths as I want or there is a limitation?

Thank You.

jimdempseyatthecove · ‎02-23-2018

>>So I can create as many code paths as I want or there is a limitation?

Hasn't your mother told you: Don't use too much of a good thing.

Restrain yourself to only those code paths required for your targeted platforms.

Jim Dempsey

Royi · ‎02-23-2018

jimdempseyatthecove wrote:

>>So I can create as many code paths as I want or there is a limitation?

Hasn't your mother told you: Don't use too much of a good thing.

Restrain yourself to only those code paths required for your targeted platforms.

Jim Dempsey

Being practical I meant can I use 3 for instance?

TimP · ‎02-23-2018

At one time I was informed the number of architecture paths was limited to 3. Anyway, it's important to choose carefully those which would be useful to your application.

Royi · ‎04-06-2018

I'm still miss understanding something.

I understand `-m` and `/arch` create a Base Code Path which works only according to the features of the CPU.
But what's the difference between `Qax`and `Qx`?

Thank You.

McCalpinJohn · ‎04-06-2018

The "x" option defines the base architecture and specifies that the code will require an Intel processor supporting that base architecture. Only one "x" option can be provided.

The "ax" option defines optional additional architectures for the compiler to consider generating alternate code paths. These optional additional architectures are also limited to Intel processors.

An example that we use here is "-xCORE-AVX2 -axMIC-AVX512,CORE-AVX512". This generates a base version that will only run on Intel processors supporting the CORE-AVX2 architecture (Haswell or newer), and may (depending on the compiler's analysis) generate versions of functions that will run on Intel Xeon Phi x200 (Knights Landing) processors and/or versions that will run on Intel Xeon Scalable processors (Skylake Xeon).

Royi · ‎04-06-2018

Hi John,

I see your point.
If I replace `-x` with `-m` than the only change is it won't require Intel CPU for the base code path yet for the other code paths it will?

So one could use either `-x` or `-m` and add to that code paths with `-ax`.
Is that correct?

Is there a way to make the `-ax` code paths work on AMD as well (Namely being based on features and not manufactur)?

Thank You.

McCalpinJohn · ‎04-09-2018

The compiler guide says that if you combine the "-march=[processor]" option with the "-ax" option, "the compiler will not generate Intel-specific instructions". This sounds like it is what you are looking for?

Royi · ‎06-01-2018

@John,

What I'm looking for is creating various code paths which are based on CPU features and not CPU Manufacture.

Royi · ‎12-27-2018

McCalpin, John wrote:
The compiler guide says that if you combine the "-march=[processor]" option with the "-ax" option, "the compiler will not generate Intel-specific instructions". This sounds like it is what you are looking for?

This is exactly what I'm trying to verify.
If I set /arch:SSE3 and /QaxAVX2 does it means I will have 2 code path for CPU's which doesn't have AVX2 (Will use SSE3 only) ans those with AVX2 regardless if they are Intel or AMD cpu's?

I wish they separated between Intel Specific and Feature Specific.
So I will be able to have 2 SSE3 Code path (Generic and Intel optimized) and 2 AVX2 Code Path (Generic and Intel).

McCalpinJohn · ‎01-02-2019

The documentation that I am looking at does not include "AVX2" as an option to the "ax" parameter. The documented option is "CORE-AVX2". My interpretation of the documentation is that the combination "/arch:SSE3" and "/QaxCORE-AVX2" will generate two code paths: a "baseline" code path using SSE3 and no Intel-specific instructions, and an alternate code path using AVX2 and no Intel-specific instructions. The documentation *seems* clear on this, but nothing beats testing.

Royi · ‎01-02-2019

John,

It's not that clear.
Here is the documentation:

If you specify both the -ax and -march options (Linux and macOS*), or the /Qax and /arch options (Windows), the compiler will not generate Intel-specific instructions.
The ax option tells the compiler to find opportunities to generate separate versions of functions that take advantage of features of the specified instruction features.
If the compiler finds such an opportunity, it first checks whether generating a feature-specific version of a function is likely to result in a performance gain. If this is the case, the compiler generates both a feature-specific version of a function and a baseline version of the function. At run time, one of the versions is chosen to execute, depending on the Intel® processor in use. In this way, the program can benefit from performance gains on more advanced Intel processors, while still working properly on older processors and non-Intel processors. A non-Intel processor always executes the baseline code path.

Have a look at the sentence "A non-Intel processor always executes the baseline code path.".
As you I understand there will be 2 code paths and since I used -m or /arch: neither of them will use Intel Only features.
But what's the point in making non Intel use base line and still make the AVX2 code path with no Intel specific features?

Intel should make it really simple, make Qax add code path which doesn't require Intel CPU in any case and add a flag Qaxi which will work on Intel CPU's only. Or something with that logic.
Intel Compiler additional code path is a killer feature but I wish they created it more Features oriented and less Intel.

But anyhow, as you can read in https://software.intel.com/en-us/forums/intel-c-compiler/topic/801797 even the base line doesn't work on Non Genuine Intel CPU's.

McCalpinJohn · ‎01-03-2019

Good catch. I don't use multi-target binaries very often, and don't believe that I have ever tried to use one on a non-Intel processor. (Partly because it is so hard to understand what is supposed to happen!)

jimdempseyatthecove · ‎01-03-2019

>>What I'm looking for is creating various code paths which are based on CPU features and not CPU Manufacture

The problem you have to overcome is the instruction to fetch the feature set (combined with the manufacturer ID) affects what gets loaded into the feature set bitmask. Hopefully the CPU feature bitmask is loaded once. You could call _may_i_use_cpu_features(~0) to setup the feature bit mask, then use the debugger and step into _may_i_use_cpu_features(~0) and find the location holding the bitmask (assuming it is not re-read using CPUID each time). Once located, you could set in the bitmask desired *** however this is at your own risk if the features you enable are not supported by your CPU.

Jim Dempsey

Additional Optimized Code Path