- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I am using Intel Fortran Compiler 11 for a CFDalgorithm and I am interested to make it as fast as possible, with the least impact on accuracy or stability. So I have improved the algorithm as much asI could in order to eliminate bottlenecks and make it faster, and used OpenMP for parallelism atthe most computationally heavy do-loops. What I am l looking for is suggestions for the compiler optimization flags. I have used the -fast flag but the algorithm turned to be a bit unstable at certain cases. On the other hand -O3 flag seems to work well. Apart from these what else can I use to speed up the program?
The CPU on which the program will run is Quad Core Xeon E5405, operating system linux 64-bit. Also I tried using the
-xsse4.1 flag, since Xeon E54XX supports sse4.1,but it is not recognised at all by the compiler. To be precise I get :
[foivos@hpc25 test]$ ifort -xsse4.1 -openmp -O3 -oSPHo.exe SPHfast.for
ifort: command line warning #10130: unknown extension 's' ignored in option '-x'
ifort: command line warning #10130: unknown extension 's' ignored in option '-x'
ifort: command line warning #10130: unknown extension 'e' ignored in option '-x'
ifort: command line warning #10130: unknown extension '4' ignored in option '-x'
ifort: command line warning #10130: unknown extension '.' ignored in option '-x'
ifort: command line warning #10130: unknown extension '1' ignored in option '-x'
etc... (compilation continues)
Will -xT be of any help in my case (since it is intended for IA-32 bit applications only)?
Any help, ideas, suggestions are appreciated
Thanks in advance.
I am using Intel Fortran Compiler 11 for a CFDalgorithm and I am interested to make it as fast as possible, with the least impact on accuracy or stability. So I have improved the algorithm as much asI could in order to eliminate bottlenecks and make it faster, and used OpenMP for parallelism atthe most computationally heavy do-loops. What I am l looking for is suggestions for the compiler optimization flags. I have used the -fast flag but the algorithm turned to be a bit unstable at certain cases. On the other hand -O3 flag seems to work well. Apart from these what else can I use to speed up the program?
The CPU on which the program will run is Quad Core Xeon E5405, operating system linux 64-bit. Also I tried using the
-xsse4.1 flag, since Xeon E54XX supports sse4.1,but it is not recognised at all by the compiler. To be precise I get :
[foivos@hpc25 test]$ ifort -xsse4.1 -openmp -O3 -oSPHo.exe SPHfast.for
ifort: command line warning #10130: unknown extension 's' ignored in option '-x'
ifort: command line warning #10130: unknown extension 's' ignored in option '-x'
ifort: command line warning #10130: unknown extension 'e' ignored in option '-x'
ifort: command line warning #10130: unknown extension '4' ignored in option '-x'
ifort: command line warning #10130: unknown extension '.' ignored in option '-x'
ifort: command line warning #10130: unknown extension '1' ignored in option '-x'
etc... (compilation continues)
Will -xT be of any help in my case (since it is intended for IA-32 bit applications only)?
Any help, ideas, suggestions are appreciated
Thanks in advance.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, there is a lot of options you can play with.
Short guide:
http://cache-www.intel.com/cd/00/00/22/23/222300_222300.pdf
Long guide:
http://cache-www.intel.com/cd/00/00/40/60/406091_406091.pdf
Generally, the most useful ones are: -O2 -O3, -ipo (-ip), -xhost
These are might be worth trying, too: -ftz -fno-alias -fno-fnalias -align all -IPF-fp-relaxed -funroll-all-loops
Sergiy
Short guide:
http://cache-www.intel.com/cd/00/00/22/23/222300_222300.pdf
Long guide:
http://cache-www.intel.com/cd/00/00/40/60/406091_406091.pdf
Generally, the most useful ones are: -O2 -O3, -ipo (-ip), -xhost
These are might be worth trying, too: -ftz -fno-alias -fno-fnalias -align all -IPF-fp-relaxed -funroll-all-loops
Sergiy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Spell it as -xSSE4.1. Case matters. If you want to use the older version, -xS would be the equivalent. -xT is SSSE3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
Spell it as -xSSE4.1. Case matters. If you want to use the older version, -xS would be the equivalent. -xT is SSSE3.
I haven't seen a CFD code which depended on complex arithmetic, so there won't necessarily be an advantage in changing from default (-msse2) to -msse3 or -xSSSE3. Depending on coding practices, sse4.1 may have an advantage.
The CPU architecture option choice is not tied with your choice to run 32-bit. On the other hand, people generally use 64-bit mode for CFD applications; only very small jobs could normally run faster in 32-bit mode.
Those interested in stability would normally set -prec-div -prec-sqrt -assume protect_parens, unless a performance loss can be associated with one of those options. This is partly a coding practices question as well. For example, if the programmer has the habit of writing /2 rather than *0.5 or *(1/2.), -prec-div may cost performance.
-prec-div is quite literal about not allowing division to be replaced by multiplication, not distinguishing between those cases where the result can't change and those where the substitution is risky. The no-prec-div option is a default presumably for historical reasons, as some past Intel CPUs didn't have competitive division performance.
-assume protect_parens requires the compiler to follow the Fortran standard on parentheses. In correctly written code, this may improve performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will keep these in mind. Thanks everyone for your time.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page