Are integers faster than floats

Schubert__William · ‎03-09-2018

Someone suggested I ask this question in the development forums, and this forum seemed the closest to where I believe the answer can be found.

I see this question on boards from other websites, but nobody seems to want to ask the people who make the actual CPU. Are integers faster than floats, like they used to be when your company first started creating processors? Are integers helpful for graphics using OpenGL, Vulkan, or DirectX? If the goal was to scan a human being in three dimensions, and display the scan on a monitor for medical purposes, and the measurements were all in microns, would it be better to store them in integers or floats?

McCalpinJohn · ‎03-10-2018

This information is easy enough to find, but understanding it can be challenging. For example, Appendix C of the Intel Optimization Reference Manual (document 248966) contains instruction latency and reciprocal throughput data for many recent Intel processors. Even more data is available from Agner Fog's comprehensive testing (e.g., http://www.agner.org/optimize/instruction_tables.pdf).

There is a huge amount of data in these resources, but the short answer is that, in most cases, floating-point arithmetic has slightly higher latency than integer arithmetic, but the same, or better, throughput (for operands of the same bit width).

There are zillions of caveats required here, among them:

Different processors have different instruction latencies and throughputs.
For data located anywhere other than the L1 Data Cache, performance may be limited by data transfer rates through the cache hierarchy. In such cases, only with "width" of the data matters (i.e., the number of data elements per cache line).
Floating-point arithmetic is almost always used with input and output widths the same (e.g., double + double => double), while integer multiplication has a result that is twice as wide as the inputs (e.g., 32-bit * 32-bit => 64-bit). This does not fit naturally into the SIMD architecture of recent processors.
- There are several approaches to handling this, but each of them results in lower throughput for integer multiplication compared to floating-point multiplication.
Integer arithmetic is also complicated by differences in signed and unsigned computations.
- In some cases this requires extra instructions to handle correctly, with a corresponding reduction in throughput.
Integer arithmetic is also complicated by the need to handle both "saturating" and "wrapping" arithmetic.
- In some cases this requires extra instructions to handle correctly, with a corresponding reduction in throughput.

In cases where you can use Byte or Word (16-bit) packed integer values, the SIMD instruction set allows operating on twice as many elements per instruction, which can provide increased computational capability. The use of packed 8-bit or 16-bit values also (typically) reduces data transfer requirements through the memory hierarchy, which can increase throughput.

Float16 format provides a similar halving of storage and therefore bandwidth, but there are currently no native computation instructions on Float16 values. The overhead of conversion from 16-bit floats to 32-bit floats before computation (and conversion back to 16-bit after computation) will typically be larger than the benefit of the reduced memory transfers, though counter-examples can almost certainly be found.

Green__Max · ‎04-05-2018

McCalpin, John, Thank you for such a detailed responce. Very informative.