Runtime speed, Win32/IVF

benh · ‎06-06-2005

Hi!

Does any of you others out there that have migrated from CVF to IVF have any feeling for the performance of the resulting executable that IVF makes, compared to what the CVF makes?

After about half a year's real-life experience here, it appears to me that no matter what kind of optimization and tweaking of compiler switches I try here, I never get the performance that the same sources compiled with CVF has... :-(

I know this is contrary to what benchmarks like /www.polyhedron.co.uk/compare/win32/f90bench_p4.html indicates, but it is what we see almost consistently. FWIW, the application in question here contains mixed C++/Fortran source, and since it has been maintained for years under CVF, emerging from even som older F77 code, I guess it cannot even be compared with what newer F90/F95 applications built from scratch can exploit of compiler-supported features.

One example of the different timings (a relatively short run, but has been repeated with a lot of different settings):

CVF: about 30.4 seconds
IVF: range 43.1--44.0 s with most settings
IVF: about 79 s with array bounds checking enabled

(all tests with "release" build and no debugging).

I think this "penalty" in the slower executable is quite significant, so I'd like to hear what kind of experiences others have had.

-+-Ben-+-

anthonyrichards · ‎06-06-2005

Is it possible that IVF 'inflates' the code compared to CVF and gives more instructions to execute, on average?

Compare the sizes of the executables. If one has more instructions to execute, then it will take proportionately more time, all other things being equal.

If your program has a or just a few 'hot spots', then compare the code generated for them by the two compilers.

benh · ‎06-06-2005

While I know the size of the EXE can have this effect, I don't think that's the primary issue here. The CVF binary in this case is about 1.6 MB, while the various outputs from IVF vary from 1.5 to 4.6 MB. (See other thread I posted about binary sizes, if that's of interest.)

-+-Ben-+-

Steven_L_Intel1 · ‎06-06-2005

Have you sent a sample program to Intel Premier Support? We're very interested in seeing such cases. Initially, it wasn't too hard to find them, but recently it's very unusual - especially with a difference that large. We would very much like to understand and solve it - please help us help you and send us a test case.

benh · ‎06-06-2005

No we didn't send a sample program, and I guess one would need to spend some time finding out exactly which part(s) of the program that triggers or contributes to poor behavior of IVF (assuming in fact that is what happens) and reduce that to as small a sample as possible for submission. (Not to mention confidentially issues with parts of the code...)

If I can boil it down to something simpler that illustrates the difference in performance, I'll make sure it gets submitted though. However, I just wanted to know whether it is a problem that others too have come across, or if our case is very exceptional.

Nevertheless, if I can be of any help by providing some clues, then I can say that there are some places heavy collaboration between C++ and Fortran code (F calls C++, calling perhaps F again, and vice versa). Many arrays are multi-dimensional, some with deferred shape and allocatable, mostly still in COMMON blocks (legacy...). In particular enabling SSE, recursive code generation, and omitting frame pointers and such are known to give no measureable gain in performance. Oh yes, and the Fortran part is compiled as static library, linked into the C++ Windows application. No idea if any of this rings any bells on your end, just shooting blindfolded here I guess. One never knows... ;-)

-+-Ben-+-

Steven_L_Intel1 · ‎06-06-2005

You don't need to reduce it, if it's difficult to do so. We'll take the whole thing and do the analysis ourselves.

The only area I know of where CVF sometimes did better was in certain loop transformations, but recent versions of IVF have pretty much eliminated that issue. If we missed something, we would very much like to see it so that we can fix it.

Calling C, COMMON blocks, etc. should be no different from CVF in terms of performance.