Re: Intel fortran 50% slower than g77?

Boris_B_ · ‎10-17-2006

I'm new to intel fortran and after installing ifort (9.1.029) on my brand new imac 24" (2.33GHz) I wanted to see how much faster it is than gnu fortran (g77) on the following trivial program:

do i=1,10**9
x=i
s=s+sin(x)
end do
print*, s
end

To my horror I found the following results:
$ g77 1.f
$ time a.out
-46.4312897
real 1m3.011s
user 1m2.828s
sys 0m0.055s

$ ifort 1.f
1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
$ time a.out
-46.41003
real 1m31.072s
user 1m30.998s
sys 0m0.042s

Intel is 50% SLOWER than gnu? Am I doing something wrong?

Steven_L_Intel1 · ‎10-18-2006

I just tried it on a somewhat faster system and got a MUCH faster time than your g77 time (about 12 seconds). But what you're really comparing is the performance of the sin intrinsic, yes? Is that what you're interested in?

Micah_Elliott · ‎10-18-2006

Probably relative performance is most useful. I don't have a Mac handy, but on Linux I see a huge advantage by IFORT. And the vectorizer gives another 2x boost when it kicks in (it appears to be active in your case, probably the Mac default).

$ g77 --version
GNU Fortran (GCC) 3.4.1
Copyright (C) 2004 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
or type the command `info -f g77 Copying'.
$ g77 -O2 t1.f -o g.x && time ./g.x
-46.4316406

real 1m44.915s
user 1m44.880s
sys 0m0.000s
$
$ ifort -V -O2 t1.f -o i.x && time ./i.x
Intel Fortran Compiler for Intel EM64T-based applications, Version 9.1 Build 20060925 Package ID: l_fc_c_9.1.039
Copyright (C) 1985-2006 Intel Corporation. All rights reserved.

Intel Fortran 9.1-6370
GNU ld version 2.14.90.0.4 20030523
-46.43410

real 0m30.866s
user 0m30.820s
sys 0m0.010s
$
$ ifort -xP -O2 t1.f -o i.x && time ./i.x
t1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
-46.41003

real 0m15.150s
user 0m15.150s
sys 0m0.000s
$

<...some time later...>

Okay, I found a Mac (sans g77), and I do see the poor performance there (2+ minutes). I tried turning off the vectorizer and the performance looks more reasonable (45s). The vectorized version uniquely has a call to '_vmlsSin4.stub'.

Please open a defect ticket for this on [https://premier.intel.com Premier].

Thanks.

Boris_B_ · ‎10-18-2006

I'm not particularly interested in the sin intrinsic per se. What I'm interested in is in getting a rough idea of the efficiency of the ifort compiler as compared to the free g77 compiler, when running on a mac. I realise that using just the sin intrinsic results in a very limiting test, but nevertheless, that g77 beats ifort by the HUGE margin of 50% indicates that there is something wrong with ifort when running on a mac.

Boris_B_ · ‎10-18-2006

Micah Elliott's discovery that vectorisation may be the problem is very interesting. This led me to the following results on my iMac:

$ ifort -O0 1.f
$ time a.out
-48.43686
real 0m31.921s
user 0m31.806s
sys 0m0.020s

$ ifort 1.f
1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
$ time a.out
-46.41003
real 1m30.764s
user 1m30.679s
sys 0m0.041s

In other words, switching off optimization entirely makes ifort run THREE times faster, amazing.

Micah_Elliott · ‎10-18-2006

> In other words, switching off optimization entirely makes ifort run THREE times faster, amazing.

You'd find further significant speedup if you enabled optimization, but simply disabled generation of the vmlsSin4 call (I can't get into the how-to here), which is where the problem appears to lie. This is simply a bug, not a general performance limitation, which I'll file on your behalf if I don't see you've filed something on Premier today.

Thanks for the useful test case.

Intel_C_Intel · ‎10-18-2006

Dear Boris,

An almost identical test case was already submitted to the compiler team and is under investigation by our library team. The issue seems MacOS specific. Just to be clear, vectorization, in general, and when combined with using our Short Vector Math Library, in particular, typically improves performance substantially where applicable. So, you simply stumbled on what is, hopefully, a short-lived glass-jaw.

If you are truly interested in comparing ifort performance with other compilers, I would suggest using a slightly larger performance test suite.

Aart Bik

http://www.aartbik.com/