Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Proper Setting of /Qax

Benedikt_R_
Beginner
1,424 Views

Hi 

Intel-Fortran-Compiler, Version 15.0.3.208

I'd like to have advice/suggestions on setting the /Qax-switch.

My program is a numerical PDE-Solver. Our customers are aware, that they should have rather "powerfull machines".

If ever possible

  • I don't want to force our customers to buy special hardware with detailed specified processors.
  • I'd like to stick to one executable for everybody

Currently I use /QaxCORE-AVX-I,AVX,SSE4.2,SSE4.1

I'd rather use something like "/Qax do the best you can - whatever it is" :-)

Greetings

Benedikt

0 Kudos
8 Replies
TimP
Honored Contributor III
1,424 Views

There's little point in asking the compiler to generate 4 variants when the differences between the AVX pair is negligible, and almost so for the SSE4 pair.  So ask for /QaxAVX /arch:SSE4.1,  if it is appropriate.  SSE4.1 was faster than 4.2 on some of the CPUs which supported both.

If you must support SSE3, it may be better to set that as the fall-back and not use SSE4.1 at all.  You'd have to evaluate this in more detail.

The compiler should attempt to evaluate whether there is an advantage in so many variants, so it may not have done as poorly as your choice might have implied.  If the compiler threw up its hands and made only one version, that might border on consideration as a bug.

0 Kudos
Steven_L_Intel1
Employee
1,424 Views

First of all, the compiler does consider whether it's advantageous to generate a separate code path for a particular instruction set. If it doesn't think it is worthwhile, it doesn't do it.

Second, I think there is a limit of two optimized code paths plus one "generic" path. The generic path is SSE2 by default but you can change this with /arch.

Really, if you are going to get into this you need to do the tests to see if there is a noticeable benefit with a tradeoff against code size. Tim's suggestion of /QxAVX /arch:SSE4.1 is one I agree with, if you can guarantee that the program will never be run on a processor that doesn't support SSE4.1.

See also https://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/

0 Kudos
Benedikt_R_
Beginner
1,424 Views

Steve & Tim

thanks a lot to both of you for your helpfull and concrete suggestions.

I will have a look wether I can expect from my custers to have processors with SSE 4.1!

Is the following statement true? If I compile with SSE 4.1 and the customers processor support just SSE3, than SSE-4.1-commands are replaced by fallbacks and the user won't have any advantages of SSE?

Thanks, Benedikt

 

0 Kudos
Steven_L_Intel1
Employee
1,424 Views

It's not clear what you're asking.

If you compile with /arch:SSE4.1 and the customer's processor doesn't support that instruction set, then if the compiler chooses to use an SSE4.1 instruction and it gets executed, the program will die with an invalid instruction fault. Not nice.  If you use /QxSSE4.1, and the processor is not an Intel processor that supports SSE4.1, then the program will get an error at startup explaining the problem.

Maybe what you want is something like /QaxAVX,SSE4.1 and leave off /arch or perhaps set it to /arch:SSE3 (which should cover most anything a customer is likely to be using nowadays.) I will also point out that non-Intel processors will always take the "generic" path when you use /Qax.

0 Kudos
jirina
New Contributor I
1,424 Views

I am in the same situation like Benedikt:

  • I would like to create one executable (CFD solver) for all customers.
  • I would like to be use maximum from what AVX and SSE offer.
  • Since I have no chance to get information about HW from all customers, I need to make sure that compiler's settings are "safe" on any computer with a processor, Intel or non-Intel, not older than let's say 4 years.

Could you please confirm that I understand well that /QaxAVX and /arch:SSE3 is the best I can do to make sure all the conditions listed above are satisfied?

Neither /Qax, nor /arch are currently set for my application.

Thank you in advance for any answer; I am sorry I am not able to get answers to my questions from the extensive documentation - I got lost in many options /Qax offers and I am not able to relate them to processors being used these days.

0 Kudos
Steven_L_Intel1
Employee
1,424 Views

That would be a good combination for your purposes.https://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/ is a worthwhile read.

0 Kudos
jirina
New Contributor I
1,424 Views

Thank you for confirming that /QaxAVX and /arch:SSE3 is a good combination. Also, thank you for the link to the interesting overview.

I can see that AVX is good for certain processors that I know our customers are using. At the same time, some of our customer are known to have bought Xeon recently. I hope to understand well that /Qax codes can be combined, so is it OK to combine e.g. AVX and CORE-AVX-I that the compiler would generate corresponding code paths? I understand that combining codes leads to a bigger executable and performance limitations, so it might be better to create multiple executables rather than combining codes in one executable. Does this sound right and reasonable?

0 Kudos
TimP
Honored Contributor III
1,424 Views

As the preceding discussion indicated, there is probably nothing to be gained in core-avx-i relative to core-avx, and ideally there would be no significant increase in .exe size.  There are a few C++ intrinsics which could run only on Ivy Bridge but no such thing directly accessible in Fortran. 

You would have to test your own application and target platforms to see whether you can measure a difference in performance between a fat .exe which supports both SSE3 and AVX and the individual single path builds.

AVX2 might give a 5% improvement over AVX if your application is highly vectorized.  In a few cases, that improvement might be wiped out by combining AVX and AVX2 in a single .exe, or the customers running past AVX platforms might notice a performance loss with the larger .exe.

The Parallel Studio Vectorization Advisor might be easy to use to see how many important vectorized loops are using AVX2 and to compare them against an AVX only build.

0 Kudos
Reply