Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Version 9.1.039 vs. Version 10.1.011

Ilie__Daniel
初学者
1,310 次查看

Hello!

I am investigating a difference in run-time in my consoleprogram. Specifically, the executable built with Version 9.1.039 of the Intel Visual Fortran(IVF) is faster than when it is built with Version 10.1.011 of IVF. There is an approximate increase of 20% in the run-time.

I am running Visual Studio 2005 Standard with SP1 on Windows XP SP3.

I have not changed my compiler options, although I am applying them through Visual Studio. I say although, because the project property interface has changed between Version 9.1 and 10.1 of the IVF. The following are excerpts from the build logs (filenames are omitted):

For Version 9.1.039

ifort /nologo /Ob2 /fpp /Qopenmp_report:0 /Qpar_report:0 /Qvec_report:0 /warn:unused /assume:byterecl /iface:cvf /traceback /check:none /libs:dll /threads /c /QaxNPT /Qvc8 /Qlocation,link,"C:Program FilesMicrosoft Visual Studio 8VCin"

For Version 10.1.011

ifort /nologo /Og /fpp /Qopenmp_report:0 /Qpar_report:0 /Qvec_report:0 /Qdiag-disable:cpu-dispatch /warn:unused /assume:byterecl /iface:cvf /module:"Release" /object:"Release" /traceback /check:none /libs:dll /threads /c /c /QaxNPT /Qvc8 /Qlocation,link,"C:Program FilesMicrosoft Visual Studio 8VCin"

Note that the documentation says that /Og is on by default in both versions. Also /Ob2 is the default in Version 10.1.011.

Am I missing something? Why are the two builds performing so differently?

I would appreciateany ideas and suggestions you might have.

Thanks and regards,

Daniel I.

0 项奖励
8 回复数
Steven_L_Intel1
1,310 次查看
Difficult to say without analyzing the actual application, but let me suggest that you try disabling /QaxNPT and replace it with the /Qx option for your processor and see what you get. NPT is being treated as PT, I think, since there is a limit of three paths (generic and two specific)
0 项奖励
Ilie__Daniel
初学者
1,310 次查看

Steve,

Thank you for answering so quickly.

I removed /QaxNPT and replaced by /QxP (as I run my program on a Pentium D). The run-time was identical to what I seen before. So, it is still slower than in Version 9.1.039.

I have tried Versions 10.1.025 and 10.0.027, although with the previous setting (/QaxNPT), and the resulting build is slower than in Version 9.1.039.

On an aside issue: I was not aware that there was a limit of three paths. The online help is not stating anything about an upper limit. According to the answer to Issue 416229 (Intel Premier Support), I can use N, P and T, but I may have misinterpreted.

Daniel.

0 项奖励
Steven_L_Intel1
1,310 次查看
There is indeed a three-path limit.

You could try adding /O3 to see if it helps. Otherwise, it would take a detailed analysis of your application to see why it has slowed down.

Let me suggest that you try the beta of version 11 and see what it does for you.
0 项奖励
Ilie__Daniel
初学者
1,310 次查看

Steve,

I have tried Version 11.0.039-beta and I got run-times identical to Version 9.1.039.

Could you tell me:

  1. When do you expect Version 11 (not beta)to be released?
  2. How far advanced is Version 11.0.039 with respect to the final version?

Best regards,

Daniel.

0 项奖励
Steven_L_Intel1
1,310 次查看
I expect version 11 to be released towards the end of November. 11.0.039 is the last beta version and, while the released version will have some changes, they would tend to be mostly bug fixes.
0 项奖励
Ilie__Daniel
初学者
1,310 次查看

Steve,

Could you possibly create an Intel Premier Support Issue for meand assign it to you?

We want to make further comments on the already discussed subject.

Thank you.

Daniel.

0 项奖励
Steven_L_Intel1
1,310 次查看
Daniel,

I could do that but I don't know who you are, exactly. You can create such an issue and ask that it be assigned to me.
0 项奖励
Kevin_D_Intel
员工
1,310 次查看
There is indeed a three-path limit.
The original compiler architect of the cpu-dispatch (/Qax) feature indicates there is no limit on the number of paths that can be specified, and therefore be generated (in theory). It is limited to the extent the compiler will take advantage of, or find the need for generating the number of different paths. For example, if one asks for four different specializations, the compiler may deem it is necessary to generate three (or less) paths. The number of paths generated is a function of the heuristics within the compiler, and not a limitation on the feature. It can also change depending on the routine, and the compiler.
0 项奖励
回复