Error using -real-size 64 without -g

Jean_Johner · ‎02-09-2010

Hello,

I am new to this forum.

I have extracted a small test case of 90 lines (joined below) which gives an error when compiled with

ifort -real-size 64

on a x86-64 machine running Redhat Linux (and probably also on Windows XP).

The result of the calculation of an integral by a gauss routine (from CERN library) gives zero instead of the correct value (in the original program, the function to be integrated was not a constant as it is in the present test case).

The result is correct using ifort alone (32 bit precision) or ifort -g -real-size 64.

There is no problem with other compilers (GNU gfortran or Portland Group f77) on the same processor. The code runs also correctly on a alpha machine.

I would like to solve this problem because some parts of my code require 128 bits precision (not in the test case) and ifort is the only compiler (to my knowledge) to support such a precision.

I guess an experienced user will immediately see what is wrong (just 90 lines to read).

Thank you for your help.

Jean Johner

[plain]c HELIOS Program : Thermal equilibrium of a thermonuclear plasma
*deck main
      external fipralpha0
c
c Prints the values of function fipralpha0 to be integrated   
      rhoin=0.
      rhofi=1.
      nbrho=11
      rhost=(rhofi-rhoin)/(nbrho-1)
      do norho=1,nbrho
      rho=rhoin+(norho-1)*rhost
      ripralpha0=fipralpha0(rho)
      print*,"rho=",rho," ripralpha0=",ripralpha0
      enddo ! norho
c Computes the integral
      gauss0=gauss(fipralpha0,0.,1.,1.e-6)
      print*,"gauss0=",gauss0
      stop
      end
*deck fipralpha0
      function fipralpha0(rho)
      data fhe/3.e-2/,tped/5./
      fipralpha0=fcalpha(fhe)*tped**0.25
c      fipralpha0=fcalpha(fhe)*sqrt(sqrt(tped))
c exponent 0.25 gives a zero integral
c sqrt(sqrt(tped)) gives a correct integral
c exponent 1.75 gives a zero integral
c exponent 0.75 gives a zero integral
c exponent 0.5 gives a correct integral
c exponent 1.5 gives a correct integral
c suppressing "fcalpha(fhe)" gives a correct result
c replacing "fcalpha(fhe)" by "(1.-2.*fhe)**2" gives a correct result
      end
*deck fcalpha
      function fcalpha(x)
      fcalpha=(1.-2.*x)**2
      end
*deck gauss
      FUNCTION GAUSS(F,A,B,EPS)
* Revision 1.1.1.1  1996/04/01 15:02:13  mclareni
* Mathlib gen
      DIMENSION W(12),X(12)
      data cst/0.005/
      DATA X( 1) /9.6028985649753623D-1/, W( 1) /1.0122853629037626D-1/
      DATA X( 2) /7.9666647741362674D-1/, W( 2) /2.2238103445337447D-1/
      DATA X( 3) /5.2553240991632899D-1/, W( 3) /3.1370664587788729D-1/
      DATA X( 4) /1.8343464249564980D-1/, W( 4) /3.6268378337836198D-1/
      DATA X( 5) /9.8940093499164993D-1/, W( 5) /2.7152459411754095D-2/
      DATA X( 6) /9.4457502307323258D-1/, W( 6) /6.2253523938647893D-2/
      DATA X( 7) /8.6563120238783174D-1/, W( 7) /9.5158511682492785D-2/
      DATA X( 8) /7.5540440835500303D-1/, W( 8) /1.2462897125553387D-1/
      DATA X( 9) /6.1787624440264375D-1/, W( 9) /1.4959598881657673D-1/
      DATA X(10) /4.5801677765722739D-1/, W(10) /1.6915651939500254D-1/
      DATA X(11) /2.8160355077925891D-1/, W(11) /1.8260341504492359D-1/
      DATA X(12) /9.5012509837637440D-2/, W(12) /1.8945061045506850D-1/
      H=0.
      IF(B .EQ. A) GO TO 99
      CONST=CST/ABS(B-A)
      BB=A
    1 AA=BB
      BB=B
    2 C1=0.5*(BB+AA)
c      print*,"aa=",aa," bb=",bb
      C2=0.5*(BB-AA)
      S8=0
      DO I = 1,4
      U=C2*X(I)
      S8=S8+W(I)*(F(C1+U)+F(C1-U))
      enddo
c      print*," c2*s8=",c2*s8
      S16=0
      DO I = 5,12
      U=C2*X(I)
      S16=S16+W(I)*(F(C1+U)+F(C1-U))
      enddo
      S16=C2*S16
c      print*," s16=",s16
      IF(ABS(S16-C2*S8) .LE. EPS*(1.+ABS(S16))) THEN
       H=H+S16
       IF(BB .NE. B) GO TO 1
      ELSE
       BB=C1
       IF(1.+CONST*ABS(C2) .NE. 1.) GO TO 2
       H=0
       PRINT*,"FUNCTION GAUSS, TOO HIGH ACCURACY REQUIRED"
       GOTO 99
      ENDIF
   99 GAUSS=H
      RETURN
      END[/plain]

Ron_Green · ‎02-09-2010

I don't see an error with the 11.1 compiler:

[rwgreen@dpd22 71856]$ ifort -O2 -fp-model precise -o repro repro.F -real-size 64
[rwgreen@dpd22 71856]$ ./repro
rho= 0.000000000000000E+000 ripralpha0= 1.32129018308707
rho= 0.100000000000000 ripralpha0= 1.32129018308707
rho= 0.200000000000000 ripralpha0= 1.32129018308707
rho= 0.300000000000000 ripralpha0= 1.32129018308707
rho= 0.400000000000000 ripralpha0= 1.32129018308707
rho= 0.500000000000000 ripralpha0= 1.32129018308707
rho= 0.600000000000000 ripralpha0= 1.32129018308707
rho= 0.700000000000000 ripralpha0= 1.32129018308707
rho= 0.800000000000000 ripralpha0= 1.32129018308707
rho= 0.900000000000000 ripralpha0= 1.32129018308707
rho= 1.00000000000000 ripralpha0= 1.32129018308707
gauss0= 1.32129018308707

I would assume you use -fp-model precise since it's clear that you are concerned with accuracy rather than absolute performance.

Martyn_C_Intel · ‎02-09-2010

The reason that -g changes the behavior is that it changes the default optimization level from -O2 to -O0. The problem seems related to the inlining of function GAUSS into the main program at -O2. After the full sequence of inlining, fcalpha into fipralpha0 into gauss into main, the compiler decides to vectorize the reduction loop "DO I=5,12", which it is unable to vectorize without the full inlining sequence, or with -real-size 32.

We will investigate further how the error comes about in the inlined, vectorized loop, and let you know what we find. I can't see anything obviously wrong with your code. In the meantime, there are several ways to work around the problem. One would be to insert a directive

!DIR$ ATTRIBUTES NOINLINE :: GAUSS

into the main program (and into any other routine that calls the function GAUSS).

Another would be to insert compiler directives

!DIR$ NOVECTOR

immediately before the two loops in GAUSS at

DO I = 1,4 and DO I = 5,12

Or, finally, to compile at a reduced optimization level, for example by adding the compiler switches

-fno-inline -no-ip.

Jean_Johner · ‎02-10-2010

Thank you very much Martyn,

I did not understand the details but clearly !DIR$ NOVECTOR added before the loops in GAUSS does fix the problem.

Perhaps you have noticed in my comments that a small modification in the writing (sqrt(sqrt(tped)) instead of tped**0.25) cures the error.

It would be nice to correct the problem since the ill configuration could occur elsewhere in the code.

Please keep me informed.

Jean Johner

Jean_Johner · ‎02-10-2010

Dear Ronald,

Thank you for your interest.

With ifort -O2 -fp-model precise -real-size 64, the error does not occur.

It seems that Martyn Corden has been able to reproduce the problem using ifort -real-size 64 alone.

Best regards.

Jean Johner

Jean_Johner · ‎02-10-2010

Some additional information.

Adding

implicit real*8 (a-h,o-z)

in the main and subroutines listed above,changing constants to double precision (e.g. 3.e-2 -> 3.d-2) everywhere and compiling with bare "ifort" results in the same error.

This shows that the problem is not linked withthe -real-size 64 option implementation but really to the 64 bit precision.

Best regards.

TimP · ‎02-10-2010

I don't see any failure either, when using the current compiler version, even when I remove the normal ifort.cfg from the installation and set risky options. If this requires a specific older version, please so indicate in such a report.

Jean_Johner · ‎02-10-2010

Hello Tim,

On my Linux server, the ifort binary sits in the following folder:

/applications/intel/Compiler/11.1/056/bin/intel64

The ifort.cfg in this folder contains only a comment line.

Perhaps Martyn Corden (Intel)could give the version of the compiler he has used toreproduce the problem.

Best regards.

TimP · ‎02-10-2010

I always put the standards compliance/compatibility options in ifort.cfg, but it made no difference with ifort 11.1/064.

-assume protect_parens,minus0,byterecl,buffered_io -prec-div -prec-sqrt

Jean_Johner · ‎02-10-2010

Dear Tim,

Do you mean that with the 11.1/064 version and a void ifort.cfg, compilation with "ifort -real-size 64" gives the correct result?

This would mean that the problem has been repaired since 056 version. Good news!

What about the version and ifort.cfg used by Martyn Corden?

Yours sincerely.

Jean Johner

Martyn_C_Intel · ‎02-10-2010

I was able to reproduce the problem in any 11.1 compiler, but not in older compilers.

It's not seen when you use -fp-model precise, because this prevents the use of fast, vectorizable math functions that are (very slightly) less accurate than the usual ones; hence, the loops with the problem don't get vectorized. The switch -no-fast-transcendentals would have the same effect.

We regret this problem; I expect it to be fixed in the next major compiler version, the developers will have to determine whether it can be fixed in an 11.1 update. In the meantime, please keep using one of these workarounds.

As an aside, (as a longtime user ofCERN software), I was amused to see the old "*deck" control cards in the source. I can't believe that CERN still uses Patchy for source management, but its ghost lives on...

Jean_Johner · ‎02-10-2010

Good job Martyn.

I am also an old user of CERN sources. I kept this *deck separation between functions and subroutines because I find it more convenient to type /k name when searching a subroutine with vi rather than trying to remember if it is /e name (for a subroutine) or /n name (for a function).

Of course patchy is no longer in question.

Best regards.

Jean Johner

Martyn_C_Intel · ‎04-23-2010

Hi Jean,
I have confirmed that this issuehas beenfixed in the 11.1 compiler update 6 (l_cprof_p_11.1.072). This updateis available for download athttps://registrationcenter.intel.com .

Regards,
Martyn