Visual Fortran Composer XE 2011 not working as Intel Compiler Version 9 did

cholcom4 · ‎05-03-2012

I have written a fortran program that has worked repeatedly with the Intel Compiler Version 9.When I compile the same files in the Microsoft Visual Studio 2008 shell using the Intel Visual Fortran Composer XE 2011 program it no longer works properly. The XE 2011 composer will finish the compiling using the same command "iforttest.f/exe:test.exe", but when I run the program it stops with a NaN (not a number error) where when compiled again with the Version 9 the error does not exist when running the program. I also noticed that when compiled in version 9 the file size is 628 KB, but when compiled with Composer XE the file size is 770 KB. Is there a way to make the compiler on XE to optimize the same as version 9 so the program will still function properly? Does there exist a windows switch setting to compile as if it was version 9?

Steven_L_Intel1 · ‎05-03-2012

There is no "pretend I am version 9" switch. There are many, many possible reasons for the differences, from errors in your code, unstable algorithms or an error in the compiler. If you'll attach a test program with everything needed to build and reproduce the problem, we'll be glad to take a look.

cholcom4 · ‎05-04-2012

The problem seems to cause a stop and an EPSXFU error per the code below. Is there something in this code that XE 2011 doesnt like but Version 9 does?

FUNCTION EPSXFU(ENTU,CR)

C-

DIMENSION E(30)

C-

A=ENTU

B=A*CR

E0=EXP(-A)

EPSXFU=1.-E0

IF(B.EQ.0.) RETURN

E(1)=A*E0+(A/B)*EXP(-A)*(EXP(-B)-1.)

E(2)=A/2.*(2.*E(1)-A*E0)+A*A/2.*EXP(-(A+B))

ZETA=E0+E(1)+E(2)

FN=2

DO 1000 N=3,30

EN=N

FNM1=FN

FN=FN*EN

E(N)=A/EN*(2.*E(N-1)-A/(EN-1.)*E(N-2))+A**EN*B**(EN-2.)*

& EXP(-(A+B))/FN/FNM1

ZETA=ZETA+E(N)

EPSXFU=1.-ZETA

IF(ABS(E(N)).LT.1.E-5) RETURN

1000 CONTINUE

STOP 'EPSXFU'

END

Steven_L_Intel1 · ‎05-04-2012

Please provide a complete program. My guess is that you are seeing the difference between use of the X87 floating point instructions, default in version 9, and the SSE2 instructions, default in later versions. With X87 you can get operations done in higher-than-declared precision, leading to unpredictable results. Given that your function is comparing against 1E-5, it's possible that doing everything in the declared single precision gives a result outside this range. You should be able to figure this out using the debugger or even print statements.

TimP · ‎05-04-2012

There are several opportunities for source transformation optimization here where an aggressive compiler could stub its toes. It might or might not try to make use of commonality between exp(-a)*exp(-b) and exp(-(a+b)), the common factor a in multiple expressions, the possibilities for elimination of several divisions, et al. It's safer to write in such things so as to give the compiler less latitude.
The most obvious thing, to reinforce what Steve said, if you are comparing ifort 9 and 10 32-bit, is the change of default option from x87 with implied double precision expression evaluation to default SSE without implied double precision. This will have an impact when you don't write the expressions carefully.
However, introduction of debug or print is likely to change how the compiler goes about optimization of your expressions.

cholcom4 · ‎05-04-2012

Sorry I can not provide the complete code since it belongs to acompany. Can you instead provide advise to things I can try instead on my own?

Steven_L_Intel1 · ‎05-04-2012

Try adding /arch:ia32 and see what you get. It is likely to be closer to what version 9 did.

cholcom4 · ‎05-04-2012

The result is the same. I am going to try and compile it inside the studio instead of at the command prompt to see if it will work that way. If you have any other ideas please let me know.

John_Campbell · ‎05-05-2012

It may not be relevant to your problem, but the routine EPSXFU could have numerical overflow with a 4 byte real. I think that is why the loop stops at 30. I'd suggest that you could change to double precision local variables (REAL*8), which might have occurred automatically for parts of the calculationin Ver 9.
I'd also replace "A**EN*B**(EN-2.)" with "A**N*B**(N-2)" or even "(A*B)**N/(B*B)", where the power is related to the DO loop index, as integer indexes for powers is my preference.

TimP · ‎05-06-2012

Compiling in VS turns on interface checking, which may be valuable. You should be turning on the compiler's stricter checking anyway, in case that forces you into fixing some of the sloppier features of this source code.

John_Campbell · ‎05-06-2012

I tried to clean up the routine, by accumulation part of the calculation and improving the precision. I ended up with slightly different results, I think due to different register settings. While the difference is only academic, it can change the exit test in some cases.
Not all changes are needed, but the use of e2/t2 and real*8 accumulators probably does address the initial problem.

real*4 epsxfu, entu, cr, x
real*8 epsxfu_8, y
external epsxfu, epsxfu_8
!
entu = .7
cr = .5
x = epsxfu (entu, cr)
write (*,*) 'epsxfu =',x
y = epsxfu_8 (entu, cr)
write (*,*) 'epsxfu_8 =',y
end

real*4 FUNCTION EPSXFU (ENTU,CR)
!
real*4 entu, cr
real*8 a, b, e0, e(30), zeta, en, fn, fnm1
real*8 t1, t2, t(30)
integer*4 n
!
A = ENTU
B = A*CR
E0 = EXP (-A)
!
IF (B.EQ.0.) then
EPSXFU = 1. - E0
RETURN
end if
!
E(1) = A*E0 + (A/B)*EXP(-A)*(EXP(-B)-1.)
E(2) = A/2.*(2.*E(1)-A*E0) + A*A/2.*EXP(-(A+B))
!
ZETA=E0+E(1)+E(2)
FN=2
!
t2 = A*A/2.* EXP(-(A+B))
t(1) = e(1)
t(2) = A/2.*(2.*E(1)-A*E0) + t2
!
DO N=3,30
!
EN = N
FNM1 = FN
FN = FN*EN
E(N) = A/EN*(2.*E(N-1)-A/(EN-1.)*E(N-2)) &
+ A**EN*B**(EN-2.)* EXP(-(A+B))/FN/FNM1
!
ZETA = ZETA+E(N)
!
t1 = a/dble(n) * ( 2. * t(n-1) - a/dble(n-1) * t(n-2))
t2 = t2 * a*b /dble(n*(n-1))
t(n) = t1 + t2
!
write (*,*) n, e(n), t(n), t2 !, abs(e(n)-t(n))
!
IF (ABS(E(N)).LT.1.d-50) then
EPSXFU=1.-ZETA
RETURN
end if
end do
!
STOP 'EPSXFU'
end

real*8 FUNCTION EPSXFU_8 (ENTU,CR)
!
real*4 entu, cr
real*8 a, b, e0, e1, e2, e(0:30), zeta
real*8, parameter :: one = 1
real*8, parameter :: two = 2
integer*4 n
!
A = ENTU
B = A*CR
E0 = EXP (-A)
!
IF (B.EQ.0.) then
EPSXFU_8 = one - E0
RETURN
end if
!
e2 = A/B * EXP(-(A+B))
E(0) = E0
E(1) = A*(E0 - E0/B) + e2
!
ZETA = E(0) + E(1)
!
DO N=2,30
!
e1 = a/dble(n) * ( two * e(n-1) - a/dble(n-1) * e(n-2))
e2 = e2 * a*b / dble(n*(n-1))
!
e(n) = e1 + e2
!
ZETA = ZETA + E(N)
write (*,*) n, e(n), e2
!
IF (ABS(E(N)).LT.1.d-50) then
EPSXFU_8 = one-ZETA
RETURN
end if
end do

STOP 'EPSXFU_8'
END

John_Campbell · ‎05-08-2012

epsxfus.f95

This is an interesting problem in round-off, as the precision of the accumulated result does have a significant effect on the outcome.

Although my estimates of ENTU and CR were a guess, the way I have structured the iteration shows an interesting result. For most iterations E2 ~= -E1, so if the accumulated results are stored in 32-bit or 64-bit variables, or 64-bit or 80-bit registers, it does have a significant effect on the round-off that results.

I am not sure of recent processor architectures, but going from x87 to SSE may have reduced the precision of the internal calculation. Also if the temporary variables are stored as 32-bit reals, 64-bit reals or retained in registers, this will change the result. Different compilers will manage this in different ways. Typically, this management is tailored to optimise performance and not precision.

My suggested change of E(N) = e1 + e2 could be interpreted by different compilers and processors in different ways to produce different "results", depending on the method of assessment. It can even force a reduced precision, by storing the values of e1 and e2, rather than retaining the accumulated calculation in higher precision registers.

The change from x87 to SSE has resulted in a lot of work for software developers, when verifying results of changes to programs by using historical benchmark data sets. Small rounding changes can result in apparently different reported results, such as structural elements passing or failing a design rule.

While the changes are typically not significant, a simple text file comparison of results fails, requiring more sophisticated tests of significance.

In the example of this post, while the differences are probably not significant, the change from x87 accumulation has probably resulted in an apparent difference.

( and we have never discussed the accuracy of the estimates for ENTU or CR)

John

cholcom4 · ‎05-09-2012

I am happy to say that the program now works after trying a different method to Optimize it inside VS. The key seems to be related to the Inline Function Expansion, it was disabled by default "/Ob0". Using VS is better than command line and I will use VS from now on. Once I changed the Inline Function Expansion to "Any Suitable" it now functions like it did before inversion 9 now. I am not entirely sure why that seems to fix the issue. If you have any idea why that is, I would be interested to know.

cholcom4 · ‎05-10-2012

Well I was happy to soon because the problem is now back. It seems to have helped, but not completely remove the error all together. I just compiled it again on my old PC using version 9.0 and again there are no issues using version 9.0. Any advise on how I can get the composer XE to compile like version 9 to remove this error for good? Or perhaps a way to make version 9 installon windows vista?

John_Campbell · ‎05-11-2012

I'd suggest that you put some tests on the value of ENTU and CR, as they might not be initialised in the calling routine. You could try a compiler optionto initialise all variables to zero to test this possibility.
You do not know if it is Version 9 or this version that has the error.

Steven_L_Intel1 · ‎05-11-2012

I ran the program (provided privately) through Static Analysis - a feature of Intel Fortran Studio XE, and it found several uninitialized variables. These could affect results.

In H1.f90, variable COND1 is passed in a call to routine CCTRH. This is associated with dummy argument COND. COND is then used in an IF test at line 182 in S7.f90. COND1 is never assigned a valuie in the caller, so the results are unpredictable.

Similarly, dummy arguments IPR and AXI in this same routine, passed as IPR1 and AXI1 from S7.f90, are uninitialized.

At line 23 in S1.f90, LOGICAL variable DBG is passed in a call to SPINE, where the corresponding dummy argument, also named DBG, is undeclared so is implicitly REAL(4). DBG is never used in SPINE, so this is harmless, but it is still an error.

Please fix these issues and try again.