Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29253 Discussions

Qax options for speed and numerical precision

davva
Beginner
1,297 Views

Hi !

I have experimented a bit with the /Qax options and found quite a big performace gain (50%). I am working with a scientific computation application in medical physics. Speed is very important but numerical stability as well!

We require of our customers that they are using Intel (P4 or higher) but there is a vast variety of cpu processors (even AMD). What is a recommended compiler options if I want to use /Qax. I have tried /QaxN wit hnice results but I am only using Pentium M processor. Is there some additional flags I should add for numerical stability (e.g. /fp:precise) ?

Thanx

0 Kudos
10 Replies
Steven_L_Intel1
Employee
1,297 Views

If you use /Qax, then customers using AMD processors will get generic IA-32 code by default in version 10.1. In version 11, the default will be SSE2 code (Pentium 4 or AMD with SSE2). If you use /QxW, you'll get the equivalent of the 11.0 default. If you have newer Intel processors (Intel Core 2), try /QxW /QaxT. This will get you SSE2 for everyone and SSSE3 on Core 2 processors.

This may be enough for stability, though you can try /fp:precise to see what it does for you.

By the way, I see that your email address is showing. I suggest clicking on "Edit Profile" to the right to change your "Display Name".

0 Kudos
TimP
Honored Contributor III
1,297 Views

If you use /Qax, then customers using AMD processors will get generic IA-32 code by default in version 10.1. In version 11, the default will be SSE2 code (Pentium 4 or AMD with SSE2). If you use /QxW, you'll get the equivalent of the 11.0 default. If you have newer Intel processors (Intel Core 2), try /QxW /QaxT. This will get you SSE2 for everyone and SSSE3 on Core 2 processors.

This may be enough for stability, though you can try /fp:precise to see what it does for you.

By the way, I see that your email address is showing. I suggest clicking on "Edit Profile" to the right to change your "Display Name".

In my experience, the most important option for correctness is /assume:protect_parens. It may be used along with /fp:precise. /fp:precise sets /Qprec-div /Qprec-sqrt /Qftz- , as well as affecting vectorization.

0 Kudos
davva
Beginner
1,297 Views

If you use /Qax, then customers using AMD processors will get generic IA-32 code by default in version 10.1. In version 11, the default will be SSE2 code (Pentium 4 or AMD with SSE2). If you use /QxW, you'll get the equivalent of the 11.0 default. If you have newer Intel processors (Intel Core 2), try /QxW /QaxT. This will get you SSE2 for everyone and SSSE3 on Core 2 processors.

This may be enough for stability, though you can try /fp:precise to see what it does for you.

By the way, I see that your email address is showing. I suggest clicking on "Edit Profile" to the right to change your "Display Name".

Thanks for the answers!

Our customers are running everything from Pentium 4 to the very latest.

I will try /QxW /QaxT!!

0 Kudos
davidspurr
Beginner
1,297 Views

... If you have newer Intel processors (Intel Core 2), try /QxW /QaxT. This will get you SSE2 for everyone and SSSE3 on Core 2 processors.

1. re /QaxT. How does this compare with /QaxS in terms of performace gain on (A) Core 2 and (B) other processors (scientific modelling code, command line program).

2. Where is /QxW set in Visual Studio?

Under "Project / Properties / Configuration / Fortram / Optimisations / Require ... Extensions" the only options I get are: /QxK, N, B, P, T, O, S and None (VS2005 PPE version 8.0.50727.867 with SP1 & update KB932232).

Thanks

David

0 Kudos
Steven_L_Intel1
Employee
1,297 Views

/QaxS is for all Core 2 processors. /QaxT is for the 45nm "Penryn" family of processors. If you don't have the particular processor type, you get generic code. I am told that the extra instructions Penryn processors have can be useful in some scientific applications, but you should do your own comparisons to see what works best for your application.

/QxW is not selectable using the property page. You can add it on the Command Line page. This will be the default in version 11.

0 Kudos
TimP
Honored Contributor III
1,297 Views

/QaxS is for all Core 2 processors. /QaxT is for the 45nm "Penryn" family of processors. If you don't have the particular processor type, you get generic code. I am told that the extra instructions Penryn processors have can be useful in some scientific applications, but you should do your own comparisons to see what works best for your application.

/QxW is not selectable using the property page. You can add it on the Command Line page. This will be the default in version 11.

Everyone has been confused by these options. /QaxS, for ifort 10.1, will take the x87 "generic" code path on Core 2, and vectorized code for Penryn. /QaxT has a vectorized code path for Core 2 (and Penryn). If you don't need the 2 code versions, you would prefer the single path version, with no "generic" code, which you get by omitting the a.

In ifort 11, /QxS has an alternate name /QxSSE4.1, and /QxP may be spelled /QxSSE3.

I haven't been able to find any advantages for /QxT (/QxSSSE3) over /QxP, which provides vectorized code for all Intel SSE3 CPUs. /QxS can vectorize some cases which /QxT cannot, but few of them show a large advantage.

0 Kudos
davidspurr
Beginner
1,297 Views

/QaxS is for all Core 2 processors. /QaxT is for the 45nm "Penryn" family of processors. If you don't have the particular processor type, you get generic code. ...

Table at the start of http://en.wikipedia.org/wiki/Intel_Core_2 seems to imply that only laptop cpu's are Penryn?

But this Intel presentation indicates that there are desktop & server variatants (Slide 7).

My desktop cpu is a QX9650. Wikipedia article suggests it is a Yorkfield XE core, while the Intel presentation has it as a Penryn. I assume the latter is correct?

Hence I should use /QaxT the desktop, but not for my T7400 laptop (will just use "generic" code)?

But, I just found this document: "White Paper: Optimizing Applications with Intel C++ and Fortran Compilers for Windows*, Linux*, and Mac OS* X: Version 11.x".

Table 11 (Recommended Optimization Options for Specific Intel Processors) on p20, recommends:

/QaxT for Intel Core2 Extreme processor

/QaxS for "Intel 45nm Hi-k next generation Intel Core microarchitecture" (? which chips)

The presentation above (Slide 7) indicates that Penryn (& specifically the QX9650) are "Hafnium-based high-k metal gate technology" Table 11 of the White Paper therefore recommends /QaxS (not /QaxT) for Penryn? Seems to be a conflict?

The White Paper also suggests that /QaxS is a higher level optimisation than /QaxT.

Will do some testing when I get time. Need to set up shortened runs first for testing the optimization variants.

0 Kudos
davidspurr
Beginner
1,297 Views
Quoting - tim18
...

/QxS can vectorize some cases which /QxT cannot, but few of them show a large advantage.

Tim replied while I was researching for my response above! Given the "but few of them show a large advantage" I may leave further checking till a (very) rainy day.

My residual concern is whether using /QaxS will be disadvantageous on my laptop (T7400); ie. will the /QaxS optimisation just drop back one step to SSE3, or go right back to "generic"?

Be nice not to have to use different code versions for each PC or laptop. Need something that will use the highest available optimisation. Others using my code still have pre-Core 2 machines.

0 Kudos
davidspurr
Beginner
1,297 Views
Quoting - david_spurr

...

My residual concern is whether using /QaxS will be disadvantageous on my laptop (T7400); ie. will the /QaxS optimisation just drop back one step to SSE3, or go right back to "generic"?

...

Argh - no longer possible to edit posts?

Re-reading Tim I see he already answered my question - seems my "worse fear" prevails; ie. /QaxS will drop back to "generic" on the T7400. Hence /QaxS not a good bet except for code that will only run on Penryn :(

0 Kudos
TimP
Honored Contributor III
1,297 Views
Quoting - david_spurr

Argh - no longer possible to edit posts?

Re-reading Tim I see he already answered my question - seems my "worse fear" prevails; ie. /QaxS will drop back to "generic" on the T7400. Hence /QaxS not a good bet except for code that will only run on Penryn :(

You could combine P and S options so as to get full optimization across a wider range of CPU types, at the expense of larger code.

Yorkfield is one of several CPUs in the Penryn family, which all support SSE4.1 code. SSE3 and SSSE3 code options also work fine on Penryn, and would usually give the same code generation as SSE4.1.

0 Kudos
Reply