Re: Intel Fortran Optimizations

fivos · ‎04-27-2009

Hi everyone,
I am using Intel Fortran Compiler 11 for a CFDalgorithm and I am interested to make it as fast as possible, with the least impact on accuracy or stability. So I have improved the algorithm as much asI could in order to eliminate bottlenecks and make it faster, and used OpenMP for parallelism atthe most computationally heavy do-loops. What I am l looking for is suggestions for the compiler optimization flags. I have used the -fast flag but the algorithm turned to be a bit unstable at certain cases. On the other hand -O3 flag seems to work well. Apart from these what else can I use to speed up the program?
The CPU on which the program will run is Quad Core Xeon E5405, operating system linux 64-bit. Also I tried using the
-xsse4.1 flag, since Xeon E54XX supports sse4.1,but it is not recognised at all by the compiler. To be precise I get :
[foivos@hpc25 test]$ ifort -xsse4.1 -openmp -O3 -oSPHo.exe SPHfast.for
ifort: command line warning #10130: unknown extension 's' ignored in option '-x'
ifort: command line warning #10130: unknown extension 's' ignored in option '-x'
ifort: command line warning #10130: unknown extension 'e' ignored in option '-x'
ifort: command line warning #10130: unknown extension '4' ignored in option '-x'
ifort: command line warning #10130: unknown extension '.' ignored in option '-x'
ifort: command line warning #10130: unknown extension '1' ignored in option '-x'
etc... (compilation continues)
Will -xT be of any help in my case (since it is intended for IA-32 bit applications only)?

Any help, ideas, suggestions are appreciated
Thanks in advance.

bubin · ‎04-27-2009

Well, there is a lot of options you can play with.

Short guide:
http://cache-www.intel.com/cd/00/00/22/23/222300_222300.pdf

Long guide:
http://cache-www.intel.com/cd/00/00/40/60/406091_406091.pdf

Generally, the most useful ones are: -O2 -O3, -ipo (-ip), -xhost

These are might be worth trying, too: -ftz -fno-alias -fno-fnalias -align all -IPF-fp-relaxed -funroll-all-loops

Sergiy

Steven_L_Intel1 · ‎04-27-2009

Spell it as -xSSE4.1. Case matters. If you want to use the older version, -xS would be the equivalent. -xT is SSSE3.

TimP · ‎04-27-2009

Quoting - Steve Lionel (Intel)

Spell it as -xSSE4.1. Case matters. If you want to use the older version, -xS would be the equivalent. -xT is SSSE3.

-msse4, -msse4.1, and -msse3 worked when I tried them. I guess the difference between them and the -x versions is the latter should give you a screen message when quitting on account of unrecognized CPU type. Don't count on it, please.
I haven't seen a CFD code which depended on complex arithmetic, so there won't necessarily be an advantage in changing from default (-msse2) to -msse3 or -xSSSE3. Depending on coding practices, sse4.1 may have an advantage.
The CPU architecture option choice is not tied with your choice to run 32-bit. On the other hand, people generally use 64-bit mode for CFD applications; only very small jobs could normally run faster in 32-bit mode.
Those interested in stability would normally set -prec-div -prec-sqrt -assume protect_parens, unless a performance loss can be associated with one of those options. This is partly a coding practices question as well. For example, if the programmer has the habit of writing /2 rather than *0.5 or *(1/2.), -prec-div may cost performance.
-prec-div is quite literal about not allowing division to be replaced by multiplication, not distinguishing between those cases where the result can't change and those where the substitution is risky. The no-prec-div option is a default presumably for historical reasons, as some past Intel CPUs didn't have competitive division performance.
-assume protect_parens requires the compiler to follow the Fortran standard on parentheses. In correctly written code, this may improve performance.

fivos · ‎04-27-2009

I will keep these in mind. Thanks everyone for your time.