- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have successfully compiled a FORTRAN based application to function on an OPENSUSE 11.1 x86_64 based corei7 (920) system. The problem is that it does not run as fast under Linux as it does, on the same machine, when using 64 bit based VISTA. I suspect that I am missing some vital compiler option in the compile, because the performance is virtually the same, whether running under console mode (init 1), or in an X-window. Following is an example of compiler directive used under Visual studio 8 for non-openmp and openmp containing directive code:
ifort /nologo /Qopenmp_report:2 /warn:interfaces /assume:byterecl /module:x64\Release /object:x64\Release /libs:static /threads /c /Qvc8 /Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin C:\Therm\IntelExothermv60\Frhg.for
ifort /nologo /Qopenmp /Qopenmp_report:2 /warn:interfaces /assume:byterecl /module:x64\Release//object:x64\Release/ /libs:static /threads /c /Qvc8 /Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin C:\Therm\IntelExothermv60\scoef-lu2-sub.for
As close as I can get on the Linux platform is:
ifort -nologo -O2 -assume byterecl -module /home/bob/Documents/source -threads -c Frhg.for
ifort -nologo -openmp -O2 -assume byterecl -module /home/bob/Documents/source -threads -c scoef-lu2-sub.for
The only other difference I see in the build is the use of a MANIFEST file, post link, under Visual Studio.
Any help would be gratefully accepted.
-bob
ifort /nologo /Qopenmp_report:2 /warn:interfaces /assume:byterecl /module:x64\Release /object:x64\Release /libs:static /threads /c /Qvc8 /Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin C:\Therm\IntelExothermv60\Frhg.for
ifort /nologo /Qopenmp /Qopenmp_report:2 /warn:interfaces /assume:byterecl /module:x64\Release//object:x64\Release/ /libs:static /threads /c /Qvc8 /Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 8\VC\bin C:\Therm\IntelExothermv60\scoef-lu2-sub.for
As close as I can get on the Linux platform is:
ifort -nologo -O2 -assume byterecl -module /home/bob/Documents/source -threads -c Frhg.for
ifort -nologo -openmp -O2 -assume byterecl -module /home/bob/Documents/source -threads -c scoef-lu2-sub.for
The only other difference I see in the build is the use of a MANIFEST file, post link, under Visual Studio.
Any help would be gratefully accepted.
-bob
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ignore the manifest on Windows. That's just a linker/OS thing. If you add -xSSE4.2 you should see a speedup on Linux, and /QxSSE4.2 on Windows. -O3 might help too on both.
I'm not sure what to suggest as far as the OS differences go, really. Sometimes we see a difference, but it isn't large. Perhaps your application does a lot of I/O? How much of a difference are we talking about?
I'm not sure what to suggest as far as the OS differences go, really. Sometimes we see a difference, but it isn't large. Perhaps your application does a lot of I/O? How much of a difference are we talking about?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
Ignore the manifest on Windows. That's just a linker/OS thing. If you add -xSSE4.2 you should see a speedup on Linux, and /QxSSE4.2 on Windows. -O3 might help too on both.
I'm not sure what to suggest as far as the OS differences go, really. Sometimes we see a difference, but it isn't large. Perhaps your application does a lot of I/O? How much of a difference are we talking about?
I'm not sure what to suggest as far as the OS differences go, really. Sometimes we see a difference, but it isn't large. Perhaps your application does a lot of I/O? How much of a difference are we talking about?
Thanks for the prompt response.
The application which I am working with is a reservoir simulator, basically a large iterative solver for sparse matrices.
On the relatively small problem, 50000 grid cells with 200k equations in 200k unknowns, simulating the performance over 9 years with monthly output. Approximately 350Mb of mixed binary and ascii output.
Under Windows Vista 64 running an image created by version 9.04 of the Professional Fortran the execution time is 1hour 54 minutes.
The same code compiled as specified under version 11.0/083 for intel64 platform took 2hours 15 to 21 minutes, even with the specification of -xSSE4.2 and -O3. (with -xSSE4.2 and -O3 execution time, 2hr 21min was longer than -xSSE4.2 and -O2 2hr 17min)
I was really hoping for an improvement on the Linux based machine, but that does not appear to be the case.
The main problem that exists here is that for any version of the Fortran compiler newer than 9.04, we have noticed that the resultant executables run much slower. Possibly even slower than demonstrated on the Linux platform.
So even though 9.04 does not have compiler options specifically for corei7 it does a better job optimizing at a lower level such that the executable performs better on the new hardware.
-regards
bob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's surprising, but I suppose it's possible for certain applications. I will comment that when we do compare applications on Windows and Linux, Windows tends to be a tiny bit faster. I don't know why.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
That's surprising, but I suppose it's possible for certain applications. I will comment that when we do compare applications on Windows and Linux, Windows tends to be a tiny bit faster. I don't know why.
Well, I am not ready to let this go, being a linux advocate. First let's make sure you are comparing the same versions of the compilers. On Windows, you say you use '9.04' - do you mean 9.0.040 or 9.1.040? You cannot compare an old compiler on windows to a new compiler on Linux. Let's get the versions consistent here:
http://software.intel.com/en-us/articles/older-version-product/
Keep in mind that the Windows versions do not match to either Linux or Mac OS. But if you get in the same general release date, shown by ifort -V, or within the same major version, like the last 9.1.xxx on linux vs. the last 9.1.xxx version on Windows. Within a major version like 9.1, don't compare an early 9.1 on one platform to the most recent 9.1 on another. Apples to apples is the goal.
Next, Linux has a variety of file systems that it can use. What is your underlying filesystem type (ext3, ext3, or ??). Many of these are journaled file systems, which means that there is a guaranteed commit to each IO operation. Great for database folks, not so good for performance. So if your code is doing a lot of small record or formatted writes, therein lies the difference. Use the 'mount' command to see what you have on Linux, then Wikipedia that file system type.
Next, instead of conjecturing where the performance difference lies, on Linux you can profile the code using gprof. Try these compiler options:
ifort -nologo -O2 -assume byterecl -module /home/bob/Documents/source -threads -c Frhg.for -g -p
ifort -nologo -openmp -O2 -assume byterecl -module /home/bob/Documents/source -threads -c scoef-lu2-sub.for -g -p
I don't see your final link line in your note: If these 2 files comprise your application, then:
ifort -g -p -o foo.exe Frhg.o scoef-lu2-sub.o
Run the code: ./foo.exe
look for the gmon.out file that is created.
gprof foo.exe
This is a function level profiler, so it will show which functions (procedures) are taking the most time. A caveat, gprof is not able to deal with contained procedures so it is not really perfect for modern Fortran codes.
As Steve said, in our testing there is very little difference (negligable) between Linux and Windows. Keep the compilers in the same major version, as close in release date as possible, and make sure disk IO is not dominating the code. If the code is compute bound, there really should be no difference.
ron

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page