<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic This information is easy in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151335#M6830</link>
    <description>&lt;P&gt;This information is easy enough to find, but understanding it can be challenging.&amp;nbsp; For example, Appendix C of the Intel Optimization Reference Manual (document 248966) contains instruction latency and reciprocal throughput data for many recent Intel processors.&amp;nbsp; Even more data is available from Agner Fog's comprehensive testing (e.g., &lt;A href="http://www.agner.org/optimize/instruction_tables.pdf)" target="_blank"&gt;http://www.agner.org/optimize/instruction_tables.pdf)&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;There is a huge amount of data in these resources, but the short answer is that, &lt;STRONG&gt;in most cases, floating-point arithmetic has slightly higher latency than integer arithmetic, but the same, or better, throughput (for operands of the same bit width).&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;There are zillions of caveats required here, among them:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Different processors have different instruction latencies and throughputs.&lt;/LI&gt;
	&lt;LI&gt;For data located anywhere other than the L1 Data Cache, performance may be&amp;nbsp; limited by data transfer rates through the cache hierarchy.&amp;nbsp; In such cases, only with "width" of the data matters (i.e., the number of data elements per cache line).&lt;/LI&gt;
	&lt;LI&gt;Floating-point arithmetic is almost always used with input and output widths the same (e.g., double + double =&amp;gt; double), while integer multiplication has a result that is twice as wide as the inputs (e.g., 32-bit * 32-bit =&amp;gt; 64-bit).&amp;nbsp;&amp;nbsp; This does not fit naturally into the SIMD architecture of recent processors.&amp;nbsp;&amp;nbsp;
		&lt;UL&gt;
			&lt;LI&gt;There are several approaches to handling this, but each of them results in lower throughput for integer multiplication compared to floating-point multiplication.&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
	&lt;LI&gt;Integer arithmetic is also complicated by differences in signed and unsigned computations.
		&lt;UL&gt;
			&lt;LI&gt;In some cases this requires extra instructions to handle correctly, with a corresponding reduction in throughput.&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
	&lt;LI&gt;Integer arithmetic is also complicated by the need to handle both "saturating" and "wrapping" arithmetic.
		&lt;UL&gt;
			&lt;LI&gt;In some cases this requires extra instructions to handle correctly, with a corresponding reduction in throughput.&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;In cases where you can use Byte or Word (16-bit) packed integer values, the SIMD instruction set allows operating on twice as many elements per instruction, which can provide increased computational capability.&amp;nbsp;&amp;nbsp; The use of packed 8-bit or 16-bit values also (typically) reduces data transfer requirements through the memory hierarchy, which can increase throughput.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Float16 format provides a similar halving of storage and therefore bandwidth, but there are currently no native computation instructions on Float16 values.&amp;nbsp; The overhead of conversion from 16-bit floats to 32-bit floats before computation (and conversion back to 16-bit after computation) will typically be larger than the benefit of the reduced memory transfers, though counter-examples can almost certainly be found.&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Sat, 10 Mar 2018 17:02:13 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2018-03-10T17:02:13Z</dc:date>
    <item>
      <title>Are integers faster than floats</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151334#M6829</link>
      <description>&lt;P&gt;Someone suggested I ask this question in the development forums, and this forum seemed the closest to where I believe the answer can be found.&lt;/P&gt;

&lt;P&gt;I see this question on boards from other websites, but nobody seems to want to ask the people who make the actual CPU.&amp;nbsp; Are integers faster than floats, like they used to be when your company first started creating processors?&amp;nbsp; Are integers helpful for graphics using OpenGL, Vulkan, or DirectX? If the goal was to scan a human being in three dimensions, and display the scan on a monitor for medical purposes, and the measurements were all in microns, would it be better to store them in integers or floats?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Mar 2018 23:15:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151334#M6829</guid>
      <dc:creator>Schubert__William</dc:creator>
      <dc:date>2018-03-09T23:15:11Z</dc:date>
    </item>
    <item>
      <title>This information is easy</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151335#M6830</link>
      <description>&lt;P&gt;This information is easy enough to find, but understanding it can be challenging.&amp;nbsp; For example, Appendix C of the Intel Optimization Reference Manual (document 248966) contains instruction latency and reciprocal throughput data for many recent Intel processors.&amp;nbsp; Even more data is available from Agner Fog's comprehensive testing (e.g., &lt;A href="http://www.agner.org/optimize/instruction_tables.pdf)" target="_blank"&gt;http://www.agner.org/optimize/instruction_tables.pdf)&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;There is a huge amount of data in these resources, but the short answer is that, &lt;STRONG&gt;in most cases, floating-point arithmetic has slightly higher latency than integer arithmetic, but the same, or better, throughput (for operands of the same bit width).&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;There are zillions of caveats required here, among them:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Different processors have different instruction latencies and throughputs.&lt;/LI&gt;
	&lt;LI&gt;For data located anywhere other than the L1 Data Cache, performance may be&amp;nbsp; limited by data transfer rates through the cache hierarchy.&amp;nbsp; In such cases, only with "width" of the data matters (i.e., the number of data elements per cache line).&lt;/LI&gt;
	&lt;LI&gt;Floating-point arithmetic is almost always used with input and output widths the same (e.g., double + double =&amp;gt; double), while integer multiplication has a result that is twice as wide as the inputs (e.g., 32-bit * 32-bit =&amp;gt; 64-bit).&amp;nbsp;&amp;nbsp; This does not fit naturally into the SIMD architecture of recent processors.&amp;nbsp;&amp;nbsp;
		&lt;UL&gt;
			&lt;LI&gt;There are several approaches to handling this, but each of them results in lower throughput for integer multiplication compared to floating-point multiplication.&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
	&lt;LI&gt;Integer arithmetic is also complicated by differences in signed and unsigned computations.
		&lt;UL&gt;
			&lt;LI&gt;In some cases this requires extra instructions to handle correctly, with a corresponding reduction in throughput.&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
	&lt;LI&gt;Integer arithmetic is also complicated by the need to handle both "saturating" and "wrapping" arithmetic.
		&lt;UL&gt;
			&lt;LI&gt;In some cases this requires extra instructions to handle correctly, with a corresponding reduction in throughput.&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;In cases where you can use Byte or Word (16-bit) packed integer values, the SIMD instruction set allows operating on twice as many elements per instruction, which can provide increased computational capability.&amp;nbsp;&amp;nbsp; The use of packed 8-bit or 16-bit values also (typically) reduces data transfer requirements through the memory hierarchy, which can increase throughput.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Float16 format provides a similar halving of storage and therefore bandwidth, but there are currently no native computation instructions on Float16 values.&amp;nbsp; The overhead of conversion from 16-bit floats to 32-bit floats before computation (and conversion back to 16-bit after computation) will typically be larger than the benefit of the reduced memory transfers, though counter-examples can almost certainly be found.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Sat, 10 Mar 2018 17:02:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151335#M6830</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2018-03-10T17:02:13Z</dc:date>
    </item>
    <item>
      <title>McCalpin, John, Thank you for</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151336#M6831</link>
      <description>&lt;P&gt;&lt;A href="https://software.intel.com/en-us/user/545611" style="font-size: 11px; background-color: rgb(238, 238, 238);"&gt;McCalpin, John&lt;/A&gt;, Thank you for such a detailed responce. Very informative.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Apr 2018 09:36:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Are-integers-faster-than-floats/m-p/1151336#M6831</guid>
      <dc:creator>Green__Max</dc:creator>
      <dc:date>2018-04-05T09:36:09Z</dc:date>
    </item>
  </channel>
</rss>

