fpe0

ximoteo · ‎01-08-2011

I've added the flag -fpe0 to control the exceptions like division by zero,
but I've noticed something strange changing the ifort version.

This is the program that generates the strange behavior:

[fortran]program fpe_test
  real*4::a,b,c
  a=1.e5
  b=0.e0
  c=a/b
 print *,c
end program fpe_test[/fortran]

compiled with
ifort fpe_test.f90 -o test -check all -warn all -traceback -fpe0
on two different machines:

machine (1) with Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.0 Build 20081105 Package ID: l_cprof_p_11.0.074
gives output
forrtl: error (73): floating divide by zero
as expected, while

on machine (2) with Intel Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072
gives
Infinty
which is totally unexpected, since I'm using -fpe0 <---------

please explain me what's happen here...

UPDATE

I've tested the code on a 3rd machine.

Intel Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20090827
Package ID: l_cprof_p_11.1.056
the output is:
forrtl: error (73): floating divide by zero

(suspect: machines 1 and 3 use intel processors, machine 2 uses AMD)
any idea?

mecej4 · ‎01-09-2011

It happens once in a while that the executable that is run is not the same as the one produced by the most recent compilation, because of how PATH is set. It is also possible that you may have two versions of the source code, one of which is edited and the other compiled!

Make sure that on the second machine you delete the executable file ("test" is not a good choice for name since it clashes with the standard test (1) utility program) and recompile. Then, use "which a.out" to make sure that the path is set such that it is the new executable that will be run. Then run.

NOTE TO INTEL:

The syntax highlighter has a bug that is shown up in this thread. The line feed between lines 5 and 6 gets removed when one chooses "view plain" (at least on Firefox 3.6.13 on Suse 11.3X64).

ximoteo · ‎01-10-2011

Thanx mecej, but I'm not sure that the problem resides there, because:
(i) This code is created ad hoc to reproduce the error, it gives this result from the first run.
(ii) The name "test" is an example here.
(iii) The code is not been edited.
(iv) PATH is ok.

At the moment I've a suspect: this code works as expected on Intel processors (machines 1 and 3), but it gives strange results on AMD (machine 3).

Ron_Green · ‎01-10-2011

I can't reproduce this, but my AMD systems are 64bit. I did try on an Opteron with 11.1.072 but again, on 64 bit. I do get the runtime error. I have a few comments.

1) if you are going to use -traceback, you need also -g to get symbolic information. Use them as a pair.
2) In these tests, it's best to disable optimizations using -O0 explicitly. For example, in your trivial case:

a=1.e5
b=0.e0
c=a/b

the compiler knows at compile time the value of A and B, and thus could pre-compute A/B as infinity and change the assignment to c thusly:

c=

Perfectly valid optimization. Saves a division. I'm not saying this is what is happening, since I can't replicate what you're seeing on my 64bit Opteron. Optimizations vary by processor, so maybe if you have a really old AMD 32bit processor the compiler may be doing this optimization above. Divisions are pretty fast on modern processors, so the optimizer is obviously leaving the code as-is. Older processors, division is more expensive. Remove the uncertainty by using -O0.

But I would try -O0, since your compile line is giving -O2 since you didn't use -g or explicity set -O level.

ron

ximoteo · ‎01-12-2011

Thanx Ronald,
(i) it seemed a good suggestion, but also with -O0 the output is still Infinity.

(ii) To completely rule out the optimization I've replaced the line

[bash]b=0.e0 [/bash]

with

[bash]read(*,*) b[/bash]

in this way the compiler cannot optimize to save a division (I've also add b=1.e0 before the read statement to be sure).
When the program reads zero from keyboard (0, 0.0, 0.e0 and 0.d0) the output is still Infinity.

OS: ubuntu 9.10 i386
These are the infos of the CPU, that is quite old. Hope this help.

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 36
model name : AMD Turion 64 Mobile Technology ML-34
stepping : 2
cpu MHz : 800.000
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm
bogomips : 1600.25
clflush size : 64
power management: ts fid vid ttp tm stc

Ron_Green · ‎01-17-2011

I don't have any ideas on this one, and I don't have a Turion to test with.

I would suggest compiling with option -dryrun for both the Turion and Intel cases and diff'ing the results.

ron

TimP · ‎01-17-2011

Original Turion didn't work with SSE2 compilation options. Turion X2 corrected that, but I still wouldn't be surprised to encounter execution issues, even differences from Opteron.

ximoteo · ‎01-18-2011

Thanx Tim.
but sse2 is listed within the flags of my CPU... I'm missing something?

t.

TimP · ‎01-18-2011

If you have Turion X2, I'm sure it supports all SSE2 instructions, but I'm not confident that exceptions behave identical to AMD or Intel desktop CPUs. Original Turion (before X2) supported many but not all SSE2 instructions, probably enough that linux would have reported sse2 in the flags. With recent Intel compilers, you would be able to run on original Turion only with the 32-bit ia32 option, so I guess you've already presented indirect evidence that you have the X2.

ximoteo · ‎01-18-2011

thanx, now some questions:
1. Is there any tool to check if the SSE2 set is complete or not? (since I cannot trust to the cpuinfo on linux!!!)
2. the fpe0 option uses ONLY instructions of the sse2? In other words, I can tell to the compiler that my CPU is without SSE2 (for example with a proper compiler flag)? in this way the fpe0 will work?

thank you again,
t.

Steven_L_Intel1 · ‎01-18-2011

-arch ia32

is what you want.

ximoteo · ‎01-18-2011

thank you for the flag Steve,
but there is any tool to check if the sse2 has all the istructions???

t.

UPDATE:
I've tried -arch ia32, it works greatly!
Thank you to everybody!

Steven_L_Intel1 · ‎01-19-2011

I don't know what you mean by "check if the sse2 has all the instructions". If a processor claims it supports SSE2, then it should support all of them. There is a bit in one of the CPUID flags that indicates SSE2 support - there are various tools available such as CPU-Z that will display this for you. I don't know the partciulars for this specific AMD CPU, but it may be that it may claim SSE2 support but not actually implement all of them. As has been said, newer AMD CPUs do support all of the SSE2 instructions.

What -arch ia32 does is tell the compiler not to generate any SSE instructions and to assume "Pentium II" level of instruction support. This can change floating point results, as the "X87" floating instructions tend to sometimes provide more than declared precision, and it will be slower on newer CPUs than if you used SSE, but it will allow the application to run on any Intel-compatible CPU made in the last 15 years or so.