ICC Executable Performance Drop (9.1.049 vs 10.0.023)

gordan · ‎06-07-2007

Hi,

I just downloaded a copy of the v10 compiler, and it would seem that the performance has actually dropped quite noticeably - by about 15% in fact. In my test program (7000 iterations of relatively simple sine curve fitting).

Compiled using v9.1.049 runs in 55 second
Compiled using v10.0.023 runs in 63 second

The compile stage output is reporting the same loops being vectorized.

Is there any reason why v10 would be slower?

The compiler options I am using are:
-fpic -O3 -fomit-frame-pointer -march=pentium3 -mcpu=pentium3 -msse -funroll-loops -mtune=pentium3 -fp-model fast=2 -rcd -xK -ipo -w1 -vec-report3

The machine in question is a Pentium 3, as the options indicate. Is there something new in the way v10 behaves? Is there another optimizer flag I have to add somewhere to get those 15% back?

Thanks.

JenniferJ · ‎06-08-2007

did you see any loop that is vectorized with 9.1 but not vectorized with 10.0? try with /Qparallel too.

gordan · ‎06-08-2007

No, this is what surprised me. The compiler output is similar, and in terms of vectorized loops, all the same loops get vectorized. But the code then runs about 15% slower.

I haven't had a chance to test this on more than one machine yet (only have a P3 handy). I'll try it on a Core 2 soon.

The don't think the program I am playing with at the moment would benefit from parallel options. The overhead of spawning threads would likely outweigh any benefits.