- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm new to intel fortran and after installing ifort (9.1.029) on my brand new imac 24" (2.33GHz) I wanted to see how much faster it is than gnu fortran (g77) on the following trivial program:
do i=1,10**9
x=i
s=s+sin(x)
end do
print*, s
end
To my horror I found the following results:
$ g77 1.f
$ time a.out
-46.4312897
real 1m3.011s
user 1m2.828s
sys 0m0.055s
$ ifort 1.f
1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
$ time a.out
-46.41003
real 1m31.072s
user 1m30.998s
sys 0m0.042s
Intel is 50% SLOWER than gnu? Am I doing something wrong?
do i=1,10**9
x=i
s=s+sin(x)
end do
print*, s
end
To my horror I found the following results:
$ g77 1.f
$ time a.out
-46.4312897
real 1m3.011s
user 1m2.828s
sys 0m0.055s
$ ifort 1.f
1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
$ time a.out
-46.41003
real 1m31.072s
user 1m30.998s
sys 0m0.042s
Intel is 50% SLOWER than gnu? Am I doing something wrong?
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just tried it on a somewhat faster system and got a MUCH faster time than your g77 time (about 12 seconds). But what you're really comparing is the performance of the sin intrinsic, yes? Is that what you're interested in?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Probably relative performance is most useful. I don't have a Mac handy, but on Linux I see a huge advantage by IFORT. And the vectorizer gives another 2x boost when it kicks in (it appears to be active in your case, probably the Mac default).
$ g77 --version
GNU Fortran (GCC) 3.4.1
Copyright (C) 2004 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
or type the command `info -f g77 Copying'.
$ g77 -O2 t1.f -o g.x && time ./g.x
-46.4316406
real 1m44.915s
user 1m44.880s
sys 0m0.000s
$
$ ifort -V -O2 t1.f -o i.x && time ./i.x
Intel Fortran Compiler for Intel EM64T-based applications, Version 9.1 Build 20060925 Package ID: l_fc_c_9.1.039
Copyright (C) 1985-2006 Intel Corporation. All rights reserved.
Intel Fortran 9.1-6370
GNU ld version 2.14.90.0.4 20030523
-46.43410
real 0m30.866s
user 0m30.820s
sys 0m0.010s
$
$ ifort -xP -O2 t1.f -o i.x && time ./i.x
t1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
-46.41003
real 0m15.150s
user 0m15.150s
sys 0m0.000s
$
<...some time later...>
Okay, I found a Mac (sans g77), and I do see the poor performance there (2+ minutes). I tried turning off the vectorizer and the performance looks more reasonable (45s). The vectorized version uniquely has a call to '_vmlsSin4.stub'.
Please open a defect ticket for this on [https://premier.intel.com Premier].
Thanks.
$ g77 --version
GNU Fortran (GCC) 3.4.1
Copyright (C) 2004 Free Software Foundation, Inc.
GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING
or type the command `info -f g77 Copying'.
$ g77 -O2 t1.f -o g.x && time ./g.x
-46.4316406
real 1m44.915s
user 1m44.880s
sys 0m0.000s
$
$ ifort -V -O2 t1.f -o i.x && time ./i.x
Intel Fortran Compiler for Intel EM64T-based applications, Version 9.1 Build 20060925 Package ID: l_fc_c_9.1.039
Copyright (C) 1985-2006 Intel Corporation. All rights reserved.
Intel Fortran 9.1-6370
GNU ld version 2.14.90.0.4 20030523
-46.43410
real 0m30.866s
user 0m30.820s
sys 0m0.010s
$
$ ifort -xP -O2 t1.f -o i.x && time ./i.x
t1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
-46.41003
real 0m15.150s
user 0m15.150s
sys 0m0.000s
$
<...some time later...>
Okay, I found a Mac (sans g77), and I do see the poor performance there (2+ minutes). I tried turning off the vectorizer and the performance looks more reasonable (45s). The vectorized version uniquely has a call to '_vmlsSin4.stub'.
Please open a defect ticket for this on [https://premier.intel.com Premier].
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not particularly interested in the sin intrinsic per se. What I'm interested in is in getting a rough idea of the efficiency of the ifort compiler as compared to the free g77 compiler, when running on a mac. I realise that using just the sin intrinsic results in a very limiting test, but nevertheless, that g77 beats ifort by the HUGE margin of 50% indicates that there is something wrong with ifort when running on a mac.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Micah Elliott's discovery that vectorisation may be the problem is very interesting. This led me to the following results on my iMac:
$ ifort -O0 1.f
$ time a.out
-48.43686
real 0m31.921s
user 0m31.806s
sys 0m0.020s
$ ifort 1.f
1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
$ time a.out
-46.41003
real 1m30.764s
user 1m30.679s
sys 0m0.041s
In other words, switching off optimization entirely makes ifort run THREE times faster, amazing.
$ ifort -O0 1.f
$ time a.out
-48.43686
real 0m31.921s
user 0m31.806s
sys 0m0.020s
$ ifort 1.f
1.f(1) : (col. 7) remark: LOOP WAS VECTORIZED.
$ time a.out
-46.41003
real 1m30.764s
user 1m30.679s
sys 0m0.041s
In other words, switching off optimization entirely makes ifort run THREE times faster, amazing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> In other words, switching off optimization entirely makes ifort run THREE times faster, amazing.
You'd find further significant speedup if you enabled optimization, but simply disabled generation of the vmlsSin4 call (I can't get into the how-to here), which is where the problem appears to lie. This is simply a bug, not a general performance limitation, which I'll file on your behalf if I don't see you've filed something on Premier today.
Thanks for the useful test case.
You'd find further significant speedup if you enabled optimization, but simply disabled generation of the vmlsSin4 call (I can't get into the how-to here), which is where the problem appears to lie. This is simply a bug, not a general performance limitation, which I'll file on your behalf if I don't see you've filed something on Premier today.
Thanks for the useful test case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Boris,
An almost identical test case was already submitted to the compiler team and is under investigation by our library team. The issue seems MacOS specific. Just to be clear, vectorization, in general, and when combined with using our Short Vector Math Library, in particular, typically improves performance substantially where applicable. So, you simply stumbled on what is, hopefully, a short-lived glass-jaw.
If you are truly interested in comparing ifort performance with other compilers, I would suggest using a slightly larger performance test suite.
Aart Bik

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page