TRANSFER function seems to cause underflow when /arch:ia32 and /fpe:1 are set

John_Leonard · ‎07-27-2011

We use the TRANSFER function to move integers into a real array and then later pull them back out as integers. Normally, we get back out what we put in, but using the switches /debug:full /arch:ia32 /fpe:1 it seems that the TRANSFER of an integer into the real causes an underflow, which then sets it 0 due to the /fpe:1 switch.

We don't see this happen with any other /arch: value. We are attempting to use the /arch:ia32 switch to build a non-processor specific version of our code.

Here's a simple test showing the problem built with /arch:ia32 and /arch:pn1

Thanks
John

D:\\jdl>type transfer_test.f90
PROGRAM transfer_test

integer nx,i
real xxx

nx = 10
xxx = TRANSFER(0,xxx)
i = TRANSFER(xxx,i)
write (6,*) i
xxx = TRANSFER(10,xxx)
i = TRANSFER(xxx,i)
write (6,*) i
xxx = TRANSFER(nx,xxx)
i = TRANSFER(xxx,i)
write (6,*) i

STOP
END

D:\\jdl>ifort /debug:full /arch:ia32 /fpe:1 transfer_test.f90
Intel Visual Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20101201 Package
ID: w_cprof_p_11.1.072
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

-out:transfer_test.exe
-debug
-pdb:transfer_test.pdb
-subsystem:console
transfer_test.obj

D:\\jdl>transfer_test.exe
0
0
0

D:\\jdl>ifort /debug:full /arch:pn1 /fpe:1 transfer_test.f90
Intel Visual Fortran Compiler Professional for applications running on IA-32, Version 11.1 Build 20101201 Package
ID: w_cprof_p_11.1.072
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.

-out:transfer_test.exe
-debug
-pdb:transfer_test.pdb
-subsystem:console
transfer_test.obj

D:\\jdl>transfer_test.exe
0
10
10

D:\\jdl>

Steven_L_Intel1 · ‎07-27-2011

Yep - I can believe it. When you use /arch:IA32, the compiler uses the x87 FLD and FSTP instructions to move the result of the TRANSFER into the variable. Since an integer 10 looks like a denormalized value, it gets flushed to zero with /fpe1.

/arch:pn1 is effectively /arch:SSE2 in the 11.1 compiler and it uses MOVSS instructions that don't have this effect. Generally, using reals to store non-real data leaves you open to the data changing. Another change can be if the value "looks like" a signaling NaN, the FLD will change it to a quiet NaN, flipping a bit.

In other words, don't do this.

jimdempseyatthecove · ‎07-27-2011

If your REAL(4) numbers are positive integers .OR. if you want rounded up values of positive reals then consider

real(4), parameter :: Bias = 2**23
integer(4), parameter :: MantissaMask= Z'007FFFFF'
...
iArray(i) = IAND(TRANSFER((Array(i) + Bias), i),MantissaMask)

Array(i) = TRANSFER(IOR(iArray(i), TRANSFER(Bias, i)), Bias) - Bias

I haven't checked on the code generation. The code optimization should be able to reduce the first statement to a load, add, and, store and may be vectorizable provided these IAND and TRANSFER are recognized as vectorizable in this case. QED to write an SSE3 C helper routine to do this 4-floats at a time. The second statement should be a load, or, subtract, store.

Handling signed numbers and truncation vs. rounding can be easily added.

Jim Dempsey

John_Leonard · ‎07-27-2011

Actually, I think we're just using TRANSFER like we used to use equivalence or map structures. I don't know why we're using a general storage area defined as real but I'm sure there's a very good reason!

However, we're seeing some other odd behavior with the combination of /arch:ia32 and /fpe:1. If we use /fpe:3 all seems well.

Regards,
John

jimdempseyatthecove · ‎07-28-2011

>>If we use /fpe:3 all seems well.

Be careful, what seems well to you now may blow up for the next person later.

If at a later date, your successor adds code to manipulate these integer bit patterns in a real array (as reals), then these numbers will be considered denormalized FP vlaues when integer is + and less than 2**23, or may be treated as SNaN or QNaN when negative, or other reserved FP value with different vlaues. And if you are not going to manipulate these numbers (other than binary write) try to remove the storage into a REAL array.

The code I presented earlier (adding Bias of 2**23 at conversion from integer to real, for positive integernumbers in range of 0-2**23-1) will permit you to manipulate the numbers as real without bunging up the value, but does require removing the bias on conversion from real back to integer)

If you have a large array for conversion, then I suggest you write a C/C++ function to perform the conversion since you can assure that SSE instructions are used. Something like this in your loop:

_mm_storeu_si128(
&rArray, // output array address
_mm_add_epi32(
&iArray,// input array address
BiasAs4int32)); // 4-up bit pattern of 2**23

The above can be reduced to 3 instructions toconvert 4ints to floats

To convert the other way (real to integer)you could use subtract or and.

Jim Dempsey

John_Leonard · ‎07-28-2011

Jim, thanks for the feedback. After looking more closely at TRANSFER and how it behaves with different parameters I don't think we're using it correctly for what we are intending to do. To that I have to agree with Steve's advice - "Don't do that!", as we really don't want these int -> real -> int conversions happening and we can see the unpredictable side effects depending on compile options.

We need to fix our code.

-John