Please provide a "reproducer"

Lingzi_P_ · ‎09-07-2016

The following code throws floating point overflow in __svml_powf4_h9 when compiled with -O3 and -O2:

do i = 1,nkd2p1

x2 = abs ((i-1)*delk/kright)

x2 = -apar*x2**bex

wrkr(i*2-1) = x2

enddo

Disable fpe check the program runs fine and give correct result. We have seen these kind of FPE caused by vectorise from time to time. By explicitly declare "!DIR$ NOVECTOR" does the trick but will have impact on performance.

The loop seems to be perfectly fine for vectorize and the numerical values in the result are far from overflow. I am wondering why overflow will happen?

The machine code when it crash:

0x00000000012349f0 <+784>: vaddpd %xmm3,%xmm2,%xmm5
0x00000000012349f4 <+788>: vpaddq %xmm7,%xmm4,%xmm1
0x00000000012349f8 <+792>: vpaddq %xmm8,%xmm5,%xmm2
=> 0x00000000012349fd <+797>: vcvtpd2ps %xmm1,%xmm3
0x0000000001234a01 <+801>: vcvtpd2ps %xmm2,%xmm4
0x0000000001234a05 <+805>: vmovlhps %xmm4,%xmm3,%xmm1
0x0000000001234a09 <+809>: test %eax,%eax
0x0000000001234a0b <+811>: jne 0x1234a4f <__svml_powf4_h9+879>
0x0000000001234a0d <+813>: vmovups 0x30(%rsp),%xmm8
0x0000000001234a13 <+819>: vmovaps %xmm1,%xmm0

(gdb) p $xmm3
$1 = {v4_float = {1.42776291e+31, 0.708184719, -6.739982e+24, 0.690881014}, v2_double = {0.00032494041370586407, 0.00025734792206721777}, v16_int8 = {-126, 53, 52, 115, -104, 75, 53, 63, -27, 103, -78, -24, -108, -35, 48, 63}, v8_int16 = {13698, 29492, 19352, 16181, 26597, -5966,
-8812, 16176}, v4_int32 = {1932801410, 1060457368, -390961179, 1060167060}, v2_int64 = {4554629716295038338, 4553382854900475877}, uint128 = 0x3f30dd94e8b267e53f354b9873343582}

mecej4 · ‎09-07-2016

There are many missing particulars, but it strikes me that you may have a mix of floats and doubles. If an expression is evaluated as a double, and is then converted to float, with the vcvtpd2ps instruction that you flagged, the double precision value may exceed the largest representable value in single precision.

You may be able to reorganize the source code and check the types of variables involved to avoid having an intermediate result that causes overflow when the final result is well within the range of single precision.

Lingzi_P_ · ‎09-07-2016

Hi, thanks for the swift reply.

I had thought i checked everything and they are within the range of single precision. But it might not be the case. The code now runs fine with vectorise after following change:

> x2 = x2**bex
> x2 = -apar*x2

So it looks like the compiler will try to save intermediate results as doubles even the local variables are defined as single?

mecej4 wrote:

There are many missing particulars, but it strikes me that you may have a mix of floats and doubles. If an expression is evaluated as a double, and is then converted to float, with the vcvtpd2ps instruction that you flagged, the double precision value may exceed the largest representable value in single precision.

You may be able to reorganize the source code and check the types of variables involved to avoid having an intermediate result that causes overflow when the final result is well within the range of single precision.

mecej4 · ‎09-07-2016

Lingzi P. wrote:

So it looks like the compiler will try to save intermediate results as doubles even the local variables are defined as single?

Intermediate results may reside entirely in registers, and have no memory footprint at all.

The Fortran standard imposes some constraints on the precision of mixed-mode and mixed-precision expressions.

Lingzi_P_ · ‎09-08-2016

Sorry I made a mistake yesterday. The code below still doesn't work. It actually crashed with the exactly same error.

I guess the compiler is just smart enough to ignore changes. Any ideas?

Lingzi P. wrote:

> x2 = x2**bex
> x2 = -apar*x2

So it looks like the compiler will try to save intermediate results as doubles even the local variables are defined as single?

Quote:

mecej4 wrote:

There are many missing particulars, but it strikes me that you may have a mix of floats and doubles. If an expression is evaluated as a double, and is then converted to float, with the vcvtpd2ps instruction that you flagged, the double precision value may exceed the largest representable value in single precision.

You may be able to reorganize the source code and check the types of variables involved to avoid having an intermediate result that causes overflow when the final result is well within the range of single precision.

mecej4 · ‎09-08-2016

Please provide a "reproducer": complete source code, data files (if needed) and instructions to compile, link and run in order to reproduce the error that you encountered.

Lingzi_P_ · ‎09-08-2016

“reproducer” with makefile is attached.

Have tried both ifort 14.0.1 and 15.0.3. Same error.

Please note we call feenableexcept(called in ut_fpmode) at the beginning to enable all the FPE as the system could not afford to have it disables.

Comment out the feenableexcept the job runs fine and give correct result.

Thanks,