Recent Computational Speed Comparison between VF-10.1 Compilation and New XE-2011 compilation ?

Bilal_B_ · ‎12-29-2010

This is interesting and disheartening at the same time ...

I am a longtime VF-10.1 user who just tested out migration to XE-2011 assuming that "recent/upgrade is always better". I have a large "number-cruncher" code that I have developed over many years using VF-10.1 and other legacy fortran compilers. I am now trying out the most recent VF-XE2011 compiler and have found that for an exactly-same code and problem (all compiler settings were also similar with the"fast code" option), the new XE2011 compiled code was a good 50% slower!!! Any comments, experiences, observations, and suggestions will be greatly appreciated. Again, my qualifiers for this speed test are:
(1) a large-scale computational code
(2) exactly same code and test case
(3) identicalWindows Workstations(single-core 3.6GHz Xeon processor, Windows 7,SCSI HD)
(4) VF-10.1 with VisualStudio2005 integration and VF-XE2011 with VisualStudio2011 integration
(5) almost identical compiler and optimization settings maximizing execution speed

Dr. Bilal A. bhutta

Steven_L_Intel1 · ‎12-29-2010

Are you willing to provide us the test program? We'll be glad to take a look.

jimdempseyatthecove · ‎12-29-2010

To eliminate system differences

Copy each built application plus required DLL's to seperate folders on a Tumb drive.

On each system run both versions of the application.

If the difference in performance persists, then it is a compiler issue.
If not, then it is a platform issue.

If compiler issue...
Set the option switched to produce the most verbose optimization reports.
See if reports are different (e.g. vectorization off on one of the systems). This may point to an option difference.

Additionaly, one of the defaults may now be different between the compiler versions (the ReadMe may indicate the changes in the defaults). Look for SSE differences (precise vs fast vs strict etc...).

Jim Dempsey

bmchenry · ‎12-29-2010

I'm wondering if anyone has contact information for Polyhedron software to find the folks in charge of the benchmarks.
Their Fortran Execution Time Benchmarks now include comparisons of INTEL XE-2011 (ver 12.) and it still is a top performer.
Would be interesting to see how their tests have changed for the INTEL compiler with changes to the versions.

Benchmarks are at:
http://www.polyhedron.com/pb05-win32-f90bench_p40html

brian

Bilal_B_ · ‎12-29-2010

Jim,
The two platforms are in fact identicaltwins other than the VF compiler and VisualStudio integration.All previous tests with the same executable have shownpractically identical times. So other than the compilation, there are indeeed no other differences that I can think of.As far as the compiler options are concenrned, I checked through option-by-optionto make sure that what could be matched was indeed matched. Actually, I even tried more optomized versions of the XE2011 builds (highest optimization, auto paralleization, etc), but got miniscual improvements. Has anybody reported any similar one-to-one speed comparisons?
Bilal
(bhutta@aerotechnologies.com)

Steven_L_Intel1 · ‎12-29-2010

Bilal, please use Intel Premier Support to submit your test program and data.

The Polyhedron tests have shown a steady improvement overall. Sometimes an individual test will be a bit slower in a new release, but catch up again later. A 50% slowdown is unheard of and we would never release the product if any of the hundreds of performance test programs we use showed that level of degradation. But there's always the possibility that your program hits things just the wrong way, so we'd like to see it.

mecej4 · ‎12-30-2010

This whole thread involves following false scents. The programs are buggy, as compiling with, e.g., /fpe0 /traceback /check:all /warn:all and running will show. There are numerous instances of misspelled variable names, usage of uninitialized variables and errors in input data.

The consequences of such errors, in terms of the Fortran standard, are "undefined". This term covers excessive run-times, premature crashes, wiping your hard-disk to pristine condition, starting a nuclear war, catching a dreaded disease, and everything else.

Because of these reasons, the title of the thread could, perhaps, be changed to something less misleading.

Bilal_B_ · ‎12-30-2010

The examples were not meant to be streamlined perfect codes, rather some realistic applied engineering cases we have to deal with. Some undefined variables are statically set to zero by the compiler and understood to be so. If you find corrections or suggestions that do meaningfully change the runtime comparison, please do post them and I will follow up.

Steven_L_Intel1 · ‎12-30-2010

Please list all the compiler options used to build the cases you're testing. You can find these in Visual Studio under Fortran > Command Line. Note that uninitialized variables are not set to zero by the compiler, except when certain options are used.

mecej4 · ‎12-30-2010

I'm sorry, I cannot agree with you that a calculation made with the gas constant R set to zero by using the compiler's default-set-to-zero is a "realistic applied engineering code". Rather, I take the position that, in effect, you killed the messenger who brought you unwelcome news. Furthermore, unless the results from the program are the same whether or not zero-initialization is used (w.r.t. all variables for which zero initialization is not physically meaningful), it is inappropriate to proceed to record execution times.

In any non-trivial program, the execution path usually depends on the input data, whether these data are read from files or are themselves the results of another calculation. With code that involves using uninitialized variables, therefore, the execution time becomes a random quantity, since the garbage values left over from other programs are unpredictable and cause the execution path to become unpredictable as well.

It is possible that you may have a too-big-to-post-here code that exposes pathological behavior by the new release of the Intel compiler. However, you have yet to establish that as fact. Your stripped-down example is buggy and achieves nothing positive towards that end.

Unless you present a reasonably sized bug-free example that displays the problems claimed, we can do nothing here towards pinpointing the reasons for the claimed slow-down.

Bilal_B_ · ‎12-30-2010

Steve, these cases were quickly put together by extracting portions from larger setup(s). I used default compiler settings in these cases for x64 Release and Debug environments. The parent code(s) do use the static memory options (subroutine meemory is retained upon rentry) and the option that all variables are initialized tozero.
Thanks,
Bilal