Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Compiler options to improve speed of a scientific calculating program

FortCpp
Beginner
1,677 Views

I am recently recompiling a Linux code on windows with intel fortran compiler (included in the intel parallel composer xe 2013). After 2 months work (replacing some I/O functions and using intel MKL), I think it works fine on windows and I checked the results. But there are some performace issues remained.

In general my question is what would be the option for making a scientific program runs faster ( a program deals with large matrices and solving eigenvalue problems using conjugate grad methods ) on Windows.  I used /O3 and some other options. but it still not as fast as I expected.

I made a benchmark between a intel compiled (/nologo /O3 /QaxSSE4.2 /QxSSE4.2 /Qunroll:1000000000 /Qvec-threshold:0 /heap-arrays512 /fpp /warn:interfaces /fp:source /names:lowercase /assume:underscore /ccdefault:none /check:none /libs:dll /threads) serial version of the code and MinGW compiled (-pipe -O3 -funroll-loops -ffast-math) serial version of the code. I compared the time consume of each iteration. intel version takes 400 sec but MinGW version takes 120 sec. That's not what I've seen on linux. In general the speed of intel compiled code is much faster. Accroding to the VTune result ( I am still a beginner of using it ),  there is only 2 subroutines in the intel build took most of the time, but the MinGW build took a lot shorter time.

I did NOT change anything of the iteration part since there is no system related things in the iteration part. And I'd like to solve the serial version completely before moving on to a parallel version.

Any ideas? 

0 Kudos
9 Replies
Steven_L_Intel1
Employee
1,677 Views
I notice that you have /QaxSSE4.2 and /QxSSE4.2 - the /Qax here is redundant. You used /fp:source which will disable a lot of optimizations - your MinGW build does not use anything similar. Why did you add that? Also why the /assume:underscore and /names:lowercase? Are you linking to a separate library that uses that convention? Let me suggest that you undo all the options you selected and just try /fast This is shorthand for a collection of options that usually gives improved performance. Adding /parallel may also show a benefit.
0 Kudos
FortCpp
Beginner
1,677 Views
Ok. I'll remove those options and just try /fast. Actually there are some c code involved, I need the /assume:underscore and /names:lowercase to make sure the libraries can be linked. In MinGW (gfortran) the code is just linked without and such kind of issues. Maybe it used lowercase and underscore as the name of functions by default. I'll update it later.
0 Kudos
FortCpp
Beginner
1,677 Views
I removed /fp:source /QaxSSE4.2 and /QxSSE4.2. Added /fast in Property-> Additional Options then rebuilt everything. Then it seems it is still slow. Then I checked it with Amplifier. It seems that the slow part is due to the library that compiled with c. So, Steve please take a look at my c options please. Maybe you can figure out why. /Zi /nologo /W3 /O3 /Oi /Ot /GA /Qunroll:100000000 /D "__STDC__" /EHsc /RTC1 /MD /GS- /fp:fast /Zc:wchar_t /Zc:forScope /Qstd=c99 /Fp"x64\OctopusRelease\libgrid_c.pch" /Fa"x64\OctopusRelease\" /Fo"x64\OctopusRelease\" /Fd"x64\OctopusRelease\vc100.pdb"
0 Kudos
FortCpp
Beginner
1,677 Views
It seems this slow the code down: [cpp] for(j = 0; j < n; j++) { register VEC_TYPE wj = VEC_SCAL(w); ...... [/cpp] And the definition of VEC_TYPE is: [cpp] #define VEC_TYPE __m256d [/cpp]
0 Kudos
Steven_L_Intel1
Employee
1,677 Views
I really can't comment intelligently on the C options. Whose C compiler did you use?
0 Kudos
FortCpp
Beginner
1,677 Views
intel compiler, the one in parallel composer 2013. Now I think the major problem is __m256d thing. I am not sure what __m256d is and there is few information of it. Though I can disable it by changing the macro in config.h, I wonder why it is that slow. Please move the post if you think necessary. Or shall I start a new post in the c discussion board?
0 Kudos
FortCpp
Beginner
1,677 Views
Hi Steve, I got some help from my friend. It turned out that I should disable the basic runtime checks (RTC) to compile the c library. I had a debug build earlier, but I didn't unchecked the checks. It works fine now. 200% faster than before. 33% faster than MinGW compiled version. But I'd like to go back to the previous topic, and I have a related question: Given a debug build, and what would be the correct way to make a release build based on the debug build? Please give me some suggestion.
0 Kudos
Steven_L_Intel1
Employee
1,677 Views
Are you building in Visual Studio or the command line? If in Visual Studio, a Release configuration is already defined - you can select it. You will probably have to redo other option changes, such as adding /fast or /parallel. From the command line, just don't use the /debug or /Od switches and don't specify other run-time checks.
0 Kudos
FortCpp
Beginner
1,677 Views
got it. Thanks Steve.
0 Kudos
Reply