16-bit floats on Xeon Phi

Andrey_Vladimirov · ‎07-12-2013

A presentation by M. Baer (DKFZ) and M. Kachelrieß (FAU) talks about 16-bit floating point format on Intel Xeon Phi coprocessors. I could not find references to that in the System Software Dev Guide or the Instruction Reference Manual. Where can I find informations about the 16-bit floating point instructions?

TimP · ‎07-13-2013

Interesting presentation, but I don't see anything in it about 16-bit floating point formats. It concentrates on the 512-bit wide (16 32-bit flats) short vector format.

Ivy Bridge host has load and store instructions which move data between 16-bit storage and 32-bit register format.

Andrey_Vladimirov · ‎07-13-2013

On slide 6, they say "Xeon Phi supports the 16 bit floating point format (half)", and then on slides 15 and 16 they report benchmarks on Xeon Phi with "floats" (I presume, 32-bit) and with "halfs", of which the latter is faster.

Andrey_Vladimirov · ‎07-13-2013

From their diagram on Slide 6 and their earlier paper, it seems that they store information as 16-bit floats to reduce the data size, but crunch the vector arithmetics on Xeon Phi by converting 16-bit numbers to 32-bit floats. It is understandable if they have their own routine for doing the type conversion. However, I wonder if the statement "Xeon Phi supports the 16 bit floating point format" refers to some undocumented features of the MIC architecture.

robert-reed · ‎07-14-2013

Actually it IS documented, though I've never tried to use it. You've probably already grabbed a copy of the instruction set reference manual, https://software.intel.com/sites/default/files/forum/278102/327364001en.pdf. If you look at Table 2.4: 32 bit Floating-point Load-op SwizzUpConv, you'll see that one of the modes provided, 011, float16 to float32. In fact, that section of the document begins with this:

Data Conversions: Sources from memory can be converted to either 32 bit signed or unsigned integer or
32 bit ϐloating-point before being used. Supported data types in memory are float16, sint8, uint8, sint16,
and uint16 for load-op instructions

Though I've never tried it, apparently the authors of the paper you cite have had some success, as you say in reducing memory pressure where data volumes are high but resolution requirements are low.

Andrey_Vladimirov · ‎07-15-2013

Ah, that's it! Thank you, Robert! After your comment I was able to locate the corresponding intrinsics (gather and scatter). With the parameter _MM_UP(DOWN)CONV_PS_FLOAT16 one can load (store) float16 data in memory to (from) vector registers as float32 data. This can be very handy!

It looks like there is no support for 16-bit floats in the Intel C/C++ compiler, so I do not suppose that automatic vectorization can handle 16-bit floats. Intrinsics or inline assembly look like the only way to go.