single-precision dummy argument promoted to double-precision has more precision than expected

de-wei-yin · ‎06-12-2008

Consider the following code, compile with -O2 or -O3, with or without -Darray.

When compiled with -Darray (making x an array in main), the first number in lines 5, 7, 9 of the output have more precision than they should (the double-precision x_dp should have the value of the single-precision dummy variable x_sp padded with zeroes after the promotion).

So is ifort keeping the result from random_number() in the extended-precision floating-point register and allowing the first x_dp=real(x_sp,dp) statement to pick up that value? Or is the subroutine add() being in-lined? I'm guessing in-lining because x_dp has the expected value promoted from x_sp if the main program and subroutine are compiled into two separate object files.

If the main program and subroutine are kept in the same file, but compiled without -Darray so that x is a scalar in main, then x_dp has the expected value of the promoted x_sp in subroutine add() when x_dp is printed the first time. Why do the use of x(i) and plain scalar x result in different values being picked up by x_dp in the subroutine?

I am using ifort 10.1.015 on an ia-32 running RHEL 5.2beta.

! compile with ifort -fpp -Darray -O2 -g test.f90
subroutine add(x_sp,s)
implicit none
integer,parameter::sp=selected_real_kind(6,37)
integer,parameter::dp=selected_real_kind(15,307)
real(sp),intent(in)::x_sp
real(dp),intent(inout)::s
real(dp)::x_dp
! type conversion
x_dp=real(x_sp,dp)
print"(1p,2e40.32)",x_dp,x_sp
! do type conversion again
x_dp=real(x_sp,dp)
print"(1p,2e40.32)",x_dp,x_sp
! next line doesn't really matter
s=s+x_dp
return
end subroutine add

program main
implicit none
integer,parameter::sp=selected_real_kind(6,37)
integer,parameter::dp=selected_real_kind(15,307)
integer,parameter::n=5
#ifdef array
real(sp),dimension(1:n)::x
#else
real(sp)::x
#endif
real(dp)::a=0.0_dp
integer::i
print*
do i=1,n,1
#ifdef array
call random_number(x(i))
call add(x(i),a)
#else
call random_number(x)
call add(x,a)
#endif< br> print*
end do
stop
end program main

TimP · ‎06-12-2008

You imply that you're starting with a false premise.

Yes, the binary double precision value must be the same as the single precision value from which it is promoted, with binary zeroes appended. You seem to think the same must be true of the converted decimal values displayed by print. That wouldbe soonly when the conversion from binary to decimal is exact, not for random values.

If the run-time library follows IEEE standard, for the single precision display value it must derive the first 9 digits from the data, after which it mayfill with 0 digits, or it may take additional digits from implicitly promoted precision. The first 17 digits of the double precision displayed value must be derived from the data, not by following the logic of your code and surmising that you intended them to match your single precision display.

When you have the compiler in-line your subroutine, it is more likely to replace all single precision a values with double when you have a single scalar value, more likely when you don't ask for SSE code. You could suppress in-lining with -fno-inline-functions, or you could set one of the usual SSE options, such as -xW, so as to use different register formats for single and double.

de-wei-yin · ‎06-12-2008

I was actually referring to right-zero-padding in the binary representation when the single-precision value is promoted to double precision. I know that arbitrary decimal numbers cannot be represented exactly as binary numbers. But any binary floating-point number can be printed exactly in decimal form if sufficient number of digits are printed, and the promotion of a binary floating-point number to higher precision will not result in changes in the preexisting digits in the complete equivalent decimal representation, and will also not result in nonzero decimal digits being appended to the exact decimal equivalent of that binary floating-point number.

My problem with the example program posted is that, with sufficient digits printed, I see that the promotion of a single-precision value to a double-precision value, as coded, results in unexpected changes in the original decimal digits as well as additional nonzero digits being appended to the decimal representation.

Let's take a look at the binary form of the numbers in question.

For example the third time the subroutine in my original example is called, I expect that the actual argument passed to the subroutine have the decimal value

3.525161445140838623046875e-1

which is the exact representation of the binary pattern

0:01111101:*01101000111110011111111

The colon : separates the pattern into the sign, exponent, and mantissa. The asterisk * represent the leading one-bit that is not explicitly stored for the normalized mantissa.

After the promotion, the decimal representation should not have changed, but it did.

3.525161445140838623046875e-1 (single-precision, as above)
3.52516147308051586151123046875e-1 (double-precision)

Here are the binary patterns of these numbers:

0:0---1111101:*01101000111110011111111-----------------------------
0:01111111101:*0110100011111001111111100011000000000000000000000000

The dashes - represent the binary digits that are not present in the single-precision value but are added when the promotion is made.

The two ones in red should be zero. Only zeroes are supposed to be padded to the end of the single-precision mantissa; where do these ones come from?

Here are the binary patterns for the fourth and fifth calls to the subroutine:

0:0---1111110:*01010101011101011101000-----------------------------
0:01111111110:*0101010101110101110011111011110000000000000000000000

0:0---1111110:*11101101000101011001110-----------------------------
0:01111111110:*1110110100010101100111000001010000000000000000000000

TimP · ‎06-13-2008

I can see the strange behavior, only when using the 32-bit compiler with no architecture option, and only on the first conversion of the 4th element. I'll look into it further.
It may be advisable with ifort 10.1 to use always one of the options which generates code for CPUs with at least Pentium III compatibility.

de-wei-yin · ‎06-13-2008

I confirm that -march={pentium{3|4}|core2}, or as I previously noted -O{0|1} or making "x" a scalar in the code or splitting the code into separate objects results in exact promotion.

With -Darray -O{2|3} -march={pentium|pentium2|notspecified} then I get the spurious one bits appended during the promotion. I have only tested the 32-bit compiler; the manual says that the 64-bit compiler assumes -march=pentium4.

The code I posted was simplied from a program to calculate statistical moments on the fly (instead of one-pass at the end) where the number of samples could run as large as 1e8 (Chan et al. Amer. Statistician 37:242, 1983), hence the accumulation of the sum and other code that I stripped out were done in double precision.

In practice I keep separate program units in separate object files and always compiler with -march=pentium4, so it's not really a problem for me. I just thought those extra one bits were strange and wanted to understand what was going on.

Thanks, Tim.

TimP · ‎06-13-2008

I am looking at the asm code generated by -S (using the 32-bit linux ifort), and it appears there is overlapping usage of the low order bytes of stack storage for the promoted element, which could be a bug.
I did test the 64-bit ifort; as you point out, it has no option to generate x87 code.
After rebooting, I have seen the discrepancies on the 2nd through 5th elements of the array.

de-wei-yin · ‎06-13-2008

Thanks for the update about the assembly code.

If you want me to submit an issue to Intel Premium Support about this problem, please let me know.

TimP · ‎06-13-2008

I submitted issue 485854.

de-wei-yin · ‎06-16-2008

Thanks for submitting the issue. Unfortunately I am not allowed to track the issue through my support account. If there is any update, please let me know. I am tracking this thread.

TimP · ‎06-19-2008

The issue was closed with the report that the problem will be fixed in the next major release of ifort.

de-wei-yin · ‎06-19-2008

Thanks for resolving this issue.