- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've added the flag -fpe0 to control the exceptions like division by zero,
but I've noticed something strange changing the ifort version.
This is the program that generates the strange behavior:
compiled with
ifort fpe_test.f90 -o test -check all -warn all -traceback -fpe0
on two different machines:
machine (1) with Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.0 Build 20081105 Package ID: l_cprof_p_11.0.074
gives output
forrtl: error (73): floating divide by zero
as expected, while
on machine (2) with Intel Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072
gives
Infinty
which is totally unexpected, since I'm using -fpe0 <---------
please explain me what's happen here...
UPDATE
I've tested the code on a 3rd machine.
Intel Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20090827
Package ID: l_cprof_p_11.1.056
the output is:
forrtl: error (73): floating divide by zero
(suspect: machines 1 and 3 use intel processors, machine 2 uses AMD)
any idea?
but I've noticed something strange changing the ifort version.
This is the program that generates the strange behavior:
[fortran]program fpe_test
real*4::a,b,c
a=1.e5
b=0.e0
c=a/b
print *,c
end program fpe_test[/fortran]
compiled with
ifort fpe_test.f90 -o test -check all -warn all -traceback -fpe0
on two different machines:
machine (1) with Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.0 Build 20081105 Package ID: l_cprof_p_11.0.074
gives output
forrtl: error (73): floating divide by zero
as expected, while
on machine (2) with Intel Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20100414 Package ID: l_cprof_p_11.1.072
gives
Infinty
which is totally unexpected, since I'm using -fpe0 <---------
please explain me what's happen here...
UPDATE
I've tested the code on a 3rd machine.
Intel Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20090827
Package ID: l_cprof_p_11.1.056
the output is:
forrtl: error (73): floating divide by zero
(suspect: machines 1 and 3 use intel processors, machine 2 uses AMD)
any idea?
Link Copied
12 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It happens once in a while that the executable that is run is not the
same as the one produced by the most recent compilation, because of how
PATH is set. It is also possible that you may have two versions of the
source code, one of which is edited and the other compiled!
Make sure that on the second machine you delete the executable file ("test" is not a good choice for name since it clashes with the standard test (1) utility program) and recompile. Then, use "which a.out" to make sure that the path is set such that it is the new executable that will be run. Then run.
NOTE TO INTEL:
The syntax highlighter has a bug that is shown up in this thread. The line feed between lines 5 and 6 gets removed when one chooses "view plain" (at least on Firefox 3.6.13 on Suse 11.3X64).
Make sure that on the second machine you delete the executable file ("test" is not a good choice for name since it clashes with the standard test (1) utility program) and recompile. Then, use "which a.out" to make sure that the path is set such that it is the new executable that will be run. Then run.
NOTE TO INTEL:
The syntax highlighter has a bug that is shown up in this thread. The line feed between lines 5 and 6 gets removed when one chooses "view plain" (at least on Firefox 3.6.13 on Suse 11.3X64).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanx mecej, but I'm not sure that the problem resides there, because:
(i) This code is created ad hoc to reproduce the error, it gives this result from the first run.
(ii) The name "test" is an example here.
(iii) The code is not been edited.
(iv) PATH is ok.
At the moment I've a suspect: this code works as expected on Intel processors (machines 1 and 3), but it gives strange results on AMD (machine 3).
(i) This code is created ad hoc to reproduce the error, it gives this result from the first run.
(ii) The name "test" is an example here.
(iii) The code is not been edited.
(iv) PATH is ok.
At the moment I've a suspect: this code works as expected on Intel processors (machines 1 and 3), but it gives strange results on AMD (machine 3).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't reproduce this, but my AMD systems are 64bit. I did try on an Opteron with 11.1.072 but again, on 64 bit. I do get the runtime error. I have a few comments.
1) if you are going to use -traceback, you need also -g to get symbolic information. Use them as a pair.
2) In these tests, it's best to disable optimizations using -O0 explicitly. For example, in your trivial case:
a=1.e5
b=0.e0
c=a/b
the compiler knows at compile time the value of A and B, and thus could pre-compute A/B as infinity and change the assignment to c thusly:
c=
Perfectly valid optimization. Saves a division. I'm not saying this is what is happening, since I can't replicate what you're seeing on my 64bit Opteron. Optimizations vary by processor, so maybe if you have a really old AMD 32bit processor the compiler may be doing this optimization above. Divisions are pretty fast on modern processors, so the optimizer is obviously leaving the code as-is. Older processors, division is more expensive. Remove the uncertainty by using -O0.
But I would try -O0, since your compile line is giving -O2 since you didn't use -g or explicity set -O level.
ron
1) if you are going to use -traceback, you need also -g to get symbolic information. Use them as a pair.
2) In these tests, it's best to disable optimizations using -O0 explicitly. For example, in your trivial case:
a=1.e5
b=0.e0
c=a/b
the compiler knows at compile time the value of A and B, and thus could pre-compute A/B as infinity and change the assignment to c thusly:
c=
Perfectly valid optimization. Saves a division. I'm not saying this is what is happening, since I can't replicate what you're seeing on my 64bit Opteron. Optimizations vary by processor, so maybe if you have a really old AMD 32bit processor the compiler may be doing this optimization above. Divisions are pretty fast on modern processors, so the optimizer is obviously leaving the code as-is. Older processors, division is more expensive. Remove the uncertainty by using -O0.
But I would try -O0, since your compile line is giving -O2 since you didn't use -g or explicity set -O level.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanx Ronald,
(i) it seemed a good suggestion, but also with -O0 the output is still Infinity.
(ii) To completely rule out the optimization I've replaced the line
with
When the program reads zero from keyboard (0, 0.0, 0.e0 and 0.d0) the output is still Infinity.
OS: ubuntu 9.10 i386
These are the infos of the CPU, that is quite old. Hope this help.
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 36
model name : AMD Turion 64 Mobile Technology ML-34
stepping : 2
cpu MHz : 800.000
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm
bogomips : 1600.25
clflush size : 64
power management: ts fid vid ttp tm stc
(i) it seemed a good suggestion, but also with -O0 the output is still Infinity.
(ii) To completely rule out the optimization I've replaced the line
[bash]b=0.e0 [/bash]
with
[bash]read(*,*) b[/bash]in this way the compiler cannot optimize to save a division (I've also add b=1.e0 before the read statement to be sure).
When the program reads zero from keyboard (0, 0.0, 0.e0 and 0.d0) the output is still Infinity.
OS: ubuntu 9.10 i386
These are the infos of the CPU, that is quite old. Hope this help.
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 36
model name : AMD Turion 64 Mobile Technology ML-34
stepping : 2
cpu MHz : 800.000
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm
bogomips : 1600.25
clflush size : 64
power management: ts fid vid ttp tm stc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't have any ideas on this one, and I don't have a Turion to test with.
I would suggest compiling with option -dryrun for both the Turion and Intel cases and diff'ing the results.
ron
I would suggest compiling with option -dryrun for both the Turion and Intel cases and diff'ing the results.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Original Turion didn't work with SSE2 compilation options. Turion X2 corrected that, but I still wouldn't be surprised to encounter execution issues, even differences from Opteron.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanx Tim.
but sse2 is listed within the flags of my CPU... I'm missing something?
t.
but sse2 is listed within the flags of my CPU... I'm missing something?
t.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have Turion X2, I'm sure it supports all SSE2 instructions, but I'm not confident that exceptions behave identical to AMD or Intel desktop CPUs. Original Turion (before X2) supported many but not all SSE2 instructions, probably enough that linux would have reported sse2 in the flags. With recent Intel compilers, you would be able to run on original Turion only with the 32-bit ia32 option, so I guess you've already presented indirect evidence that you have the X2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanx, now some questions:
1. Is there any tool to check if the SSE2 set is complete or not? (since I cannot trust to the cpuinfo on linux!!!)
2. the fpe0 option uses ONLY instructions of the sse2? In other words, I can tell to the compiler that my CPU is without SSE2 (for example with a proper compiler flag)? in this way the fpe0 will work?
thank you again,
t.
1. Is there any tool to check if the SSE2 set is complete or not? (since I cannot trust to the cpuinfo on linux!!!)
2. the fpe0 option uses ONLY instructions of the sse2? In other words, I can tell to the compiler that my CPU is without SSE2 (for example with a proper compiler flag)? in this way the fpe0 will work?
thank you again,
t.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-arch ia32
is what you want.
is what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for the flag Steve,
but there is any tool to check if the sse2 has all the istructions???
t.
UPDATE:
I've tried -arch ia32, it works greatly!
Thank you to everybody!
but there is any tool to check if the sse2 has all the istructions???
t.
UPDATE:
I've tried -arch ia32, it works greatly!
Thank you to everybody!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know what you mean by "check if the sse2 has all the instructions". If a processor claims it supports SSE2, then it should support all of them. There is a bit in one of the CPUID flags that indicates SSE2 support - there are various tools available such as CPU-Z that will display this for you. I don't know the partciulars for this specific AMD CPU, but it may be that it may claim SSE2 support but not actually implement all of them. As has been said, newer AMD CPUs do support all of the SSE2 instructions.
What -arch ia32 does is tell the compiler not to generate any SSE instructions and to assume "Pentium II" level of instruction support. This can change floating point results, as the "X87" floating instructions tend to sometimes provide more than declared precision, and it will be slower on newer CPUs than if you used SSE, but it will allow the application to run on any Intel-compatible CPU made in the last 15 years or so.
What -arch ia32 does is tell the compiler not to generate any SSE instructions and to assume "Pentium II" level of instruction support. This can change floating point results, as the "X87" floating instructions tend to sometimes provide more than declared precision, and it will be slower on newer CPUs than if you used SSE, but it will allow the application to run on any Intel-compatible CPU made in the last 15 years or so.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page