NaN boxing

velvia · ‎02-21-2017

Hi,

I am designing a class called Dynamic that can store many different types: a null, a boolean, a 48-bit signed integer, a 64-bit floating point, or a pointer to one of a few defined types. This object contains only one data member: a double.

In order to do that, I use the fact that there are many way to represent a NaN, and only 2 NaN needs to exist in the standard (this trick is known as NaN boxing and is heavily used in some javascript engines). If we look at the bits used to represent a NaN, we have (on a little endian machine):

[first 48 bits][4 bits][1111][1111][111|0]

The eleven 1 are placed at the exponent, and the 0 is placed at the sign bit. A double is a NaN when the eleven exponent bits are equal to 1 and one of the first 52 bits is not equal to 0.

Some experiments show that quiet NaN are represented by [000...000][0001][1111][1111][1110] and signaling NaN are represented by [000...000][0010][1111][1111][1110].

Is it possible to have other NaNs with Intel compilers and the Intel libraries (MKL, SVML, etc) ?

SergeyKostrov · ‎02-21-2017

>>...Is it possible to have other NaNs with Intel compilers and the Intel libraries (MKL, SVML, etc)? You need to look at IEEE 754 Standard Specs ( revised versions as well ) but I'm confident that Intel doesn't introduce anything new related to NaN ( Dividing a Zero by Zero ).

velvia · ‎02-21-2017

The IEEE 754-2008 standard only says:

- A number is a NaN iff all the exponent bit are equal to 1, and one of the first 52 bits is not equal to 0

- The most significant bit of the 52 bits tells if the NaN is quite (bit set to 1) or quiet (bit set to 0)

It does not say anything more. The payload is totally unspecified.

McCalpinJohn · ‎02-22-2017

I don't know about Intel processors, but when I worked at AMD the engineers assured me that (at that time) AMD processors only generated one qNaN encoding and one sNaN encoding. The hardware recognizes all of the NaN formats and (when appropriate) passes them through unchanged, but it would only generate one of the two encodings as an output value with non-NaN inputs. I used this for holding metadata (typically "full/empty" or "valid/invalid" bits in some low-level accelerator communication routines. (I also used the upper 16 bits of pointers to hold metadata, then converted the pointers to canonical form before use.)

It seems likely that Intel hardware generates very few NaN formats, but it also seems likely that they will not commit to that as a "feature". (Hardware engineers really, really don't like to commit to any specific behavior as a "feature" if it is not required by law or by very large amounts of $$$. It may seem easy at the time, but it does not take long before one gets "bitten" by unnecessary commitments that are no longer easy to implement.)

jimdempseyatthecove · ‎02-23-2017

With John's caveat, you should be safe using your format of NaN's provided that you never write using FPU/SSE/AVX/AVX2/AVX512 floating point instructions. Note, the compiler may possibly use the aforementioned instructions to COPY arguments passed by value. It would be your responsibility to pass by reference or assure that any float or double is appropriately cast to assure it is passed (pushed) as int32 or int64.

Note, compiler optimizations may take the liberty to remove the cast when it knows the variable is currently located in a FPU/SSE/AVX/AVX2/AVX512 register. You may be able to cast the address of the variable as volatile as you otherwise you may not be assured that (future) compiler optimizations will do you harm.

If Intel claims to conform to IEEE 754-2008 standards, then they must at least trap on all SNaN's.

Jim Dempsey