intel fortran slower than compaq fotran?

matejt · ‎09-03-2007

Hello!

Until now I used Compaq visual fortran, but now I need openMP and so I started to use Intel visula fortran 9.1. I could write parallel application with openMP, but then i was surprised that it was a little bit slower than single threaded compiled with compaq fortran.

I then excluded openMP functions on intel fortran and i was surprised that single threaded executable was more than twice slower as compaq, so single threaded intel is two times slower than fortran. In both cases I used O3 optimization. Then i played with optimizations options in intel fotran but without any success.

So my question is, is that true, that optimized code with compaq is two times faster than with intel? Which options I should use to get faster executble in intel?

Best Regards, Matej

onkelhotte · ‎09-03-2007

Ive seen similar behaviour in my code when Iported from CVF6.6 to IVF9.1 on my private AMD computer. CVF was 15% faster.

But now we have IVF10.0 in our office and the same program is now 10% fastercompared to CVF6.6 onmy office Pentium 4.

I will try on my private computer when I get home from work so I can see ifthe compiler or the CPU architecture is responsible for the execution speed.

So there is nogeneral answer because itdepends on yourcode and your machine.

Markus

Steven_L_Intel1 · ‎09-03-2007

Adding OpenMP to an application will slow it down for a single-thread. If you look at the test results at www.polyhedron.com you'll see that Intel Fortran significantly outperforms CVF on their tests. I have seen an occasional case where CVF does better but these are very rare (not so rare a few years ago.)

You'll generally get best results selecting the "generate code for..." option appropriate for your computer's processor. /O3 optimization and /Qipo are other options that can further improve performance.

TimP · ‎09-03-2007

My most glaring cases where ifort doesn't match CVF performance involve loops with variable strides. For example, the compiler may not perform optimizations which would break with the loop index incremented by 0, even though that would be a broken non-sensical case anyway.

As Steve indicates, CVF was highly optimized for non-SSE CPUs of 10 years ago, while ifort is far better when using appropriate SSE flags for current CPUs, so there is seldom a possibility of "apples to apples" comparison.

If your application depends on -O3 to perform inter-loop optimizations, parallelizing those loops with OpenMP may be expected to prevent those optimizations. OpenMP is fairly literal "do as I say, don't check," necessarily so, to avoid unpleasant surprises. ifort 10 with -O3 -Qparallel may find some of those optimizations, but it is even more important with Qparallel to check the usefulness of the transformations.