IVF version 9 slower than version 8?

Peter_Simon · ‎08-06-2005

I have noticed that an important application seems to be running slower when compiled under Version 9 ( W_FC_C_9.0.020) than it did when compiled by the last released Version 8 compiler. Here are the benchmark timings in seconds:

Compiler Version Benchmark 2 Benchmark 3

IVF 8 957 2078

IVF 9 984 2142

Both benchmarks run about 3% slower under IVF 9.

For both compiler versions, the options used were: /nologo /O3 /Qxn.

All runs were performed on the same machine, an HP XW6000 workstation with dual P4 Xeon processors and 4 Gbytes of RAM, with no other applications running simultaneously. The output of the Intel Processor Frequency ID utility on this platform is:

Intel Processor Frequency ID Utility
Version: 7.0.20040526
Time Stamp: 2005/08/06 15:03:00
Number of processors in system: 2
Current processor: #1
Processor Name: Intel Xeon CPU 2.80GHz
Type: 0
Family: F
Model: 2
Stepping: 9
Revision: 22
L1 Trace Cache: 12 Kops
L1 Data Cache: 8 KB
L2 Cache: 512 KB
L3 Cache: None
Packaging: OOI
MMX: Yes
SIMD: Yes
SIMD2: Yes
SIMD3: No
NetBurst Microarchitecture: Yes
Hyper-Threading Technology: No
Expected Processor Frequency: 2.80 GHz
Reported Processor Frequency: 2.80 GHz
Expected System Bus Frequency: 533 MHz
Reported System Bus Frequency: 533 MHz
*************************************************************

I am at a loss as to why identical compiler options results in a consistently slower executable with the new compiler. Has anyone else noticed anything similar?

Thanks,

Peter

Steven_L_Intel1 · ‎08-06-2005

Our own benchmarks show the opposite, but that doesn't mean that your particular application couldn't slow down. Sometimes a new optimization helps a majority of programs but hurts a few.

If you'd like the matter investigated, please send a report to Intel Premier Support and attach the sources so that we can look into it.

Just checking - you are using /QxN and not /Qxn - right? I think it might matter. Does /Qipo help or hurt?

Peter_Simon · ‎08-06-2005

Thanks for the quick reply, Steve.

Yes, you are correct that the option I am using is /QxN, not /Qxn (a typo in my first posting). I am currently running with /Qipo and will report later today whether it helps or hurts.

My previous experience is not very good with Premier Support, which is why I posted here first.

--Peter

TimP · ‎08-06-2005

9.0 may perform -Qip by default, where it was not a default with 8.1. If you have only one source file, I would think there would be no difference between -Qip and -Qipo. In order to diagnose your performance problem, it may be necessary to profile, to find out where the extra time is spent.

Peter_Simon · ‎08-07-2005

I've run the benchmarks with the options suggested by Steve and Tim. The updated results are:

Compiler Options Benchmark 2 Benchmark 3

IVF 8/O3 /QxN 957 2078

IVF 9/O3 /QxN 984 2142

IVF 9 /O3 /QxN /Qipo 1015 2229

IVF 9 /O3 /QxN /Qip 9872159

So it looks like for this application /Qipo significantly hurts the performance and /Qip has a very slight negative effect on performance.

More about this application: This code isa reflector antenna shape optimizationcode purchased by my company from a commercial vendor. It consists of 24 source files containing dozens of modules and hundreds of routines. When I reported on using /Qipo above, it was actually specified for all but 2 of the source files. The program will not execute successfully if I specify /Qipo on those two files.

The timing difference between IVF 8 and IVF 9 is only about 3%, but it is repeatable and consistent. I had hoped for a performance improvement, not a slight hit, when upgrading the compiler. When I originally converted to IVF, I did extensive benchmarking of the code under IVF 8 to determine the best compiler options to result in the fastest run times. If you have made changes to the optimizations I'm afraid that I will have to now repeat that effort.

--Peter

Steven_L_Intel1 · ‎08-07-2005

Please submit an issue to Intel Premier Support with a description of the problem and everything we need to build and test the application. We'll have our performance experts look at it.

From the results, I'm guessing that some inlining decisions are bad for your application.

Peter_Simon · ‎08-07-2005

I've submitted this as issue 319479. Thanks for the help.

--Peter

jim_dempsey · ‎08-08-2005

Peter,

During my use of IVF 8 I made and observation and reported an incident concerning SSE3 instructions. The problem related to an alignment issue as SSE3 instructions require alignment at 16 byte intervals. The problem usualy showed up as a runtime error. In my opinion this was a LINKer problem.

When version 9 came out I examined the code generated to see if the problem was fixed. What I found was the erronious code was "fixed" and it appears that the fix was to remove the SSE3 instructions. If your application benefited from the use of SSE3 optimizations then V8 would run faster than V9. You can verify this by including /S to compile to .ASM file and then inspect the differences.

Jim Dempsey

Steven_L_Intel1 · ‎08-08-2005

/QxN doesn't generate SSE3 instructions anyway. (Only /QxP does.)

Intel_C_Intel · ‎08-08-2005

Jim,

First, I doubt the fix you refer to was really to naively remove SSE3. More likely, an alignment was resolved, after which the compiler decided that SSE3 would not be beneficial. Can you give some more details on the issue you reported?
Second, since Peter uses QxN, the use of SSE3 cannot be an issue at all.

Aart Bik
http://www.aartbik.com/

jim_dempsey · ‎08-10-2005

The problem with the SSE3 instruction related to movapd e.g.

movapd xmmword ptr [_MOD_ALL_mp_ALL+20h (5E7A38h)],xmm0

Where movapd is used as a mini block move instruction (versis computational SSE3 instructions).

If you notice from the above snippet from a debug session the target location is 16 byte aligned to a module but the module was not linked to a 16 byte aligned address (you can figure this out by looking at the hex address 5E7A38h). Either the compiler should have flagged the module to have a 16 byte (or integral multiple thereof) alignment or the linker disregarded the alignment instructions. My work around in V8 was to inspect the linker map file and then add padd variables when needed.

V9 seems to have "fixed" this problem by eliminating the movapd instructions and most likely by way of indicating "don't know" alignment of variable address. It is possible that this "fix" (owka hack) is responsible for V9 running slightly slower than V8.

Steven_L_Intel1 · ‎08-10-2005

Are you not aware of the ALIGN keyword on the ATTRIBUTES and PSECT directives?

jim_dempsey · ‎08-11-2005

I did try the ALIGN attribute on the variables but the alignment seemed to have been made to offset within the segment in which the variable resided. The segment itself did not inherit the (worst case) alignment restrictions within the segment.

I have not tried alignment by use of the !DEC$ PSECT...

Note, when looking in the IVF documentation by way the index under ALIGN (or alignment) there is no reference to PSECT. Search for "ALIGN PSECT" does find it but then you have to know the magic keyword.

When looking in the PSECT for what it does it says:

common-name
Is the name of the common block.

The data in my application with the alignment problem is not in a common block. It is in a module. I think I tried using PSECT but since there was no common block of that name (the module mangled name) the compiler balked. The problem is the the module's data, although having alignment within it's data segment,seems to have no directive for alignment of the segment within which it resides.

It would seem to me that if the programmer issued

module foo

REAL(8) :: var(12345)

cDEC$ ATTRIBUTES ALIGN: 16 :: var

end module foo

That the segment in which var resides is aligned in a compatible manner to which the offset of var is aligned.

Jim Dempsey