Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

MPI, ifort, fpe0

useybold
Beginner
1,812 Views
Hello,

I am working with Intel Fortran and C++ compilers, version 8.1, and MPICH, version 1.2.6 on an Intel Pentium IV machine. My parallelized programs crash if very small numbers are calculated and the code was compiled with option "fpe0". The problem seems to be related to the fact that real numbers are stored in the floating-point processor for use in a subsequent calculation so that higher accuracy is possible (cf. option "[no]fltconsistency").

I have prepared two simple Fortran programs "serial.f" and "parallel.f" (see listings at the bottom) in order to illustrate the issue. Program "serial.f" is written for serial execution; program "parallel.f" is the parallelized version of "serial.f".

Here is a description of how I compile program "serial.f" and the output that I obtain when running it.

ifort serial.f
a = 2.0000000E-07
a**6 = 6.4000103E-41
a**20 = 0.0000000E+00

ifort -fpe0 serial.f
a = 2.0000000E-07
a**6 = 6.4000103E-41
a**20 = 0.0000000E+00

ifort -O0 serial.f
a = 2.0000000E-07
a**6 = 6.4000103E-41
a**20 = 0.0000000E+00

ifort -O0 -fpe0 serial.f
a = 2.0000000E-07
a**6 = 0.0000000E+00
a**20 = 0.0000000E+00


-> In this sample program, option "fpe0" does not have any visible effect when used together with optimization (ifort serial.f) as real numbers seem to be treated with higher accuracy. If "fpe0" is used without optimization, a**6 is adjusted to 0.0000000E+00.


If I apply similar compiler options to the parallel version of this sample program ("parallel.f") and run the program on one or several processors (e.g. "mpirun -np 1 a.out") I get the following results:

mpif90 parallel.f
a = 2.0000000E-07
a**6 = 6.4000103E-41
a**20 = 0.0000000E+00

mpif90 -fpe0 parallel.f
a = 2.0000000E-07
a**6 = 6.4000103E-41
a**20 = 0.0000000E+00

mpif90 -O0 parallel.f
a = 2.0000000E-07
a**6 = 6.4000103E-41
a**20 = 0.0000000E+00

mpif90 -O0 -fpe0 parallel.f
a = 2.0000000E-07
p0_: p4_error: interrupt SIGFPE: 8

-> Apparently the program crashes when "fpe0" comes into effect. (In my "real world" application this also happens if "fpe0" is used together with optimization.)

I tried different compile options (with and without "fpe0") and compilers (gcc and icc) on the MPI library but the program still crashed.

Since I would still like to use "fpe0" together with MPI I would be very thankful for any hints on whether this is possible at all and if so, how it can be achieved.

Thank you in advance.

Udo


PS.:
The option "mpiversion" on the executable leads to the following output:
MPICH 1.2.6release first patches of $Date: 2004/08/04 11:10:38$., ADI version 2.00 - transport ch_p4
Configured with --prefix=/usr/local/mpich-1.2.6

uname -a:
Linux itsm-numerik2 2.6.5-7.75-smp #1 SMP Mon Jun 14 10:44:37 UTC 2004 i686 i686 i386 GNU/Linux

The config.log files (mpi compilation) can be found at
ftp://ftp.itsm.uni-stuttgart.de/pub/intel_mpi

----------------------------------------------------------------------
program serial.f

PROGRAM serial

IMPLICIT NONE

REAL :: a, b

a = 2.e-7

WRITE(*,*) "a = ", a

b = a**6
WRITE(*,*) "a**6 = ", a**6
WRITE(*,*) "a**20 = ", a**20

END


----------------------------------------------------------------------
program parallel.f

PROGRAM parallel

IMPLICIT NONE

INCLUDE "mpif.h"

INTEGER ierror, myid
REAL :: a, b


CALL MPI_INIT(ierror)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierror)

a = 2.e-7

WRITE(*,*) "a =", a

c----------------------------------------------------------------------c
c-- Either of the following commands causes a crash --c
c-- if the program was compiled with "-O0 -fpe0". --c
c-- --c
c-- Error message: --c
c-- p0_: p4_error: interrupt SIGFPE: 8 --c
c----------------------------------------------------------------------c

b = a**6
WRITE(*,*) "a**6 = ", a**6
WRITE(*,*) "a**20 = ", a**20

CALL MPI_FINALIZE(ierror)

END
0 Kudos
5 Replies
TimP
Honored Contributor III
1,812 Views
Why not declare a to be double precision REAL(selected_real_kind(15,307))
if you are depending on a range greater than that of default real?

Mpich is a bit fragile. If you are looking for reliable results, you might want to avoid odd corner combinations of options and value ranges. Anything other than SSE code, with gradual underflow disabled, might count as an odd choice nowadays. If you are researching ways to break it, you should look up the implications of what you are doing.
0 Kudos
useybold
Beginner
1,812 Views
I do not think that declaring "a" to be double precision solves the problem. I still get crashes, only for smaller values, e.g. for a**200

Actually I do not need a range greater than default real in my application. I rather intented to use "-fpe0" in order to enforce default real for consistency and compatibility reasons (especially for output to files).


Now I discussed the issue again with a colleague of mine. He explains the problem as follows:

Exception handling is deactivated by default. Therefore, if a value is calculated which is smaller than the smallest default real number and I do not use "-fpe0", no exception signal will be issued. The internal representation of the result can vary:

1) If the result is smaller than the smallest value that can be expressed in double precision, it will be set to 0.E0.

2) Otherwise, the result will either be expressed as a double precision number (such as 6.4000103E-41 in the above example) or 0.E0. Which case will happen depends on whether the value can be stored in the floating-point processor or not. Thus the result can depend, e.g., on whether optimization is used or not.

In any case the calculation will have a "reasonable" result (very small double precision number or 0.E0) and the program will continue.


However, if I use the option "-fpe0" (without optimization) in the serial version, two things will happen if a number smaller than the smallest default real is calculated:

1) An exception signal will be issued.

2) The exception handler from the compiler will always set the result to zero, i.e., the result will not depend on whether optimization is used or not.

This seems to be the reason for the crash of the parallel version. Apparently the exception signal will be set in some MPI function (which is (partly ?) written in C) and MPI will cause a crash because no exception handler is available.

As a consequence of this there seem to be two options for me:

1) Not to use "-fpe0". I could not find any details on that topic in the manual, so I can but deduce from my tests that this option is only active if no optimization is used; i.e.: if optimization is activated it makes no difference whether I use "-fpe0" or not (at least in this simple example).
Further, it seems that MPI is able to cope with real values outside of the default range such as 6.4000103E-41. From this perspective it should be safe to omit "-fpe0".
However, if I use binary files for the input/output routines and read/write them on different types of machines, it may be necessary to compile at least these input/output routines with "-O0 -fpe0".

2) An exception handler could be used to ensure that those parts of MPI that use C do not crash if they receive the underflow exception signal. However, we do not know how to write/integrate this exception handler.

Do you agree with the explanation and the ways to handle the problem? Do you know about any exception handler available that could solve the problem or do you know any other options (e.g. Intel C++ compiler settings)?


Thanks.

Udo
0 Kudos
TimP
Honored Contributor III
1,812 Views
OK, I'll comment on a few points.
In your example, when you generated 6.4e-41, that is a sub-normal (earlier called denormal) single precision value. As you can see from the report, it has reduced precision. It is not promoted to double precision, except in the x87 extended precision intermediate register representation. By the time the run-time library WRITE functions see it, it has been stored as a single precision sub-normal. If you don't have IEEE gradual underflow enabled, sub-normals are "flushed to zero."
Single precision sub-normals fall roughly in the range 1e-45 .t. x .lt.1e-38, and double precision sub-normals 1e-324 .lt. d .lt. 1e-308. More exactly, TINY(x)*EPSILON(x) .le. x .lt. TINY(x).
Many people consider gradual underflow to be inconsistent, although the IEEE standards require it. In mixed x87 and SSE code, setting -ftz (which applies only to the SSE code) clearly causes inconsistency, although many readers will disagree with me. Your demonstration appears to show that -fpe0 also causes inconsistencies with x87 code. Your -O0 code probably generates more opportunities for -fpe0 to trap underflow.
If your aim is to avoid generating sub-normals, and you don't need to trap on underflow, you probably want SSE2 code with abrupt underflow (-xW -ftz), without setting -fpe0.
If you do want these values x 1e-38 to be handled accurately, rather than being subject to underflow, you must change your code to double precision.
Your colleague may be right about conflicts between traps generated by -fpe0 and the handling of traps in the MPI library. I don't know whether anyone has tackled the issues raised by attempting to trap floating point exceptions in an MPI application. That conflicts with the aim of expediting performance by parallel execution. That's why the IEEE standards were written to provide options for dealing with underflow with or (by default) without exception traps.
If you want to raise the issue of dealing with exceptions in parallel applications, we have Intel Parallel Architectures forum. MPI does involve threading, so this would be on topic there.
0 Kudos
Somenath_Jalal
Beginner
1,812 Views
Very nice thread. I am little late may be (as date of this thread is 7 years old!!!) but selected_real_kind(15,307) is not helping me.
I have to find the non-sero matrix elements of a large sparse matrix which is to be used by davidson subroutine for diagonalization. Now no of non-zero elements counted by ifort and gfortran ar not matching. for minimum dimension of the sparse hamiltonian I have, it has 152 non-zero elements counted by ifort where as gfortran counts 144. This leads to big error in my final result of minimum eigen value(5%).
So how to define zero/epsilon of my problem/code so that it will be compiler/machine independent?
Thanks a lot.
0 Kudos
TimP
Honored Contributor III
1,812 Views
Among the possible reasons for the difference between ifort and gfortran is that ifort normally is run in abrupt underflow -ftz mode, while gfortran usually reserves that mode for compilation with -ffast-math. If you don't care to resolve such differences in compilation options, or by setting the same mode in both cases at the beginning of your program, or such resolution doesn't solve the problem, comparisons like abs(x) < tiny(x) might cover the difference between a zero and a sub-normal. Intel Sandy Bridge architecture is designed to handle most sub-normals efficiently, but to my knowledge that hasn't affected practices such as setting of compiler defaults.
0 Kudos
Reply