<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Q&amp;A:  Maximum FPS: Three Tips for Faster Code in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Q-A-Maximum-FPS-Three-Tips-for-Faster-Code/m-p/851380#M1887</link>
    <description>&lt;P&gt;&lt;EM&gt;&lt;FONT face="Arial" color="#000080" size="2"&gt;These questions were received by Intel Software Network Support about the original article located at &lt;A href="http://www.intel.com/cd/ids/developer/asmo-na/eng/19123.htm"&gt;http://www.intel.com/cd/ids/developer/asmo-na/eng/19123.htm&lt;/A&gt;, followed by the responses received:&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;Q.&lt;/STRONG&gt; I'm abeginner when it comesto caching. Considering a 32-bit address as discussed in this article, an 8KB cache which is 4-way associative and each cache line is 64 bytes, hence 0 - 5 =&amp;gt; offset into 64 byte cache line 6 - 10 =&amp;gt; index into set, total 32 sets 11 - 31 =&amp;gt; must be tags used for a match (cache hit). Please give some information on how exactly a cache&lt;BR /&gt;hit occurs. Thanks. &lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;STRONG&gt;A.&lt;/STRONG&gt; A cache hit occurring is both a simple and complex operation. It is an automated part of instruction processing by the CPU. Simply put, for memory read operations, if the cacheline containing the memory requested by a machine instruction exists in the cache, then the data is retrieved from the cache. This is a cache hit. Similarly, for write operations, if the cacheline exists in the cache then the memory store is done to the cached version of the memory.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;The actual implementation of a cache is more complex, and varies from processor to processor in its details. For memory read operations, when a machine instruction requires data from memory, it begins a memory fetch, which first checks the address of the needed memory against the tag arrays of the ways. The tag arrays store the upper address bits of the cached line for each line in each way. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Pseudo code for that would look like this:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;// Cache parameters &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define CACHE_SIZE 8*1024 &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define NUM_WAYS 4&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define CACHELINE_SIZE 64&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define NUM_SETS (CACHE_SIZE / CACHELINE_SIZE / NUM_WAYS)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;// Think of a cache as an array structured like this:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;struct cache {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; uint32 tag[NUM_SETS][NUM_WAYS];&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;};&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;bool cache_hit(uint32 address) {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; uint32 set = (address / CACHELINE_SIZE) % NUM_SETS;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; uint32 tag_bits = address / NUM_SETS / CACHELINE_SIZE;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; // In hardware, all of the ways are looked up in parallel,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; // and a hit is determined on a tag-match&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; for (uint32 way = 0; way &amp;lt; NUM_WAYS; way++) {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; if (cache-&amp;gt;tag[set][way] =
= tag_bits) {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; return true; // The data is in the cache&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; }&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; }&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; return false; // The data was not found in the cache&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;}&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;If there is a match, then the data is fetched from the matching way. For write operations, the same tag check is done, and if there is a match, the write is done to the cached line. There are a bunch of other details here (such are write-through vs. write-back, and cacheline dirty bits) that vary not only from processor to processor, but by configuration of the cache and memory subsystem. Additionally, certain pages may be marked non-cacheable if, for example, they belong to memory that is modified by devices outside the processor.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;The Wikipedia article is a good general background. &lt;A href="http://en.wikipedia.org/wiki/CPU_cache"&gt;http://en.wikipedia.org/wiki/CPU_cache&lt;/A&gt; (with the caveats that one should never take Wikipedia as a primary source, or fully accurate). Other topics that might be of interest include cacheline eviction policies, different types of caches, cache hierarchy, cache coherence on multicore systems, and the use of prefetch instructions for performance.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;--------&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Gina B.&lt;BR /&gt;Intel Software Network Support&lt;BR /&gt;&lt;/FONT&gt;&lt;A href="http://www.intel.com/software"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/A&gt;&lt;A href="http://www.intel.com/software" target="_blank"&gt;http://www.intel.com/software&lt;/A&gt;&lt;BR /&gt;&lt;FONT face="Arial" size="2"&gt;email: &lt;/FONT&gt;&lt;A href="mailto:ISN.support@intel.com"&gt;&lt;FONT face="Arial" size="2"&gt;ISN.support@intel.com&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;*Other names and brands may be claimed as the property of others.&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 22 Jun 2007 23:01:27 GMT</pubDate>
    <dc:creator>Intel_Software_Netw1</dc:creator>
    <dc:date>2007-06-22T23:01:27Z</dc:date>
    <item>
      <title>Q&amp;A:  Maximum FPS: Three Tips for Faster Code</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Q-A-Maximum-FPS-Three-Tips-for-Faster-Code/m-p/851380#M1887</link>
      <description>&lt;P&gt;&lt;EM&gt;&lt;FONT face="Arial" color="#000080" size="2"&gt;These questions were received by Intel Software Network Support about the original article located at &lt;A href="http://www.intel.com/cd/ids/developer/asmo-na/eng/19123.htm"&gt;http://www.intel.com/cd/ids/developer/asmo-na/eng/19123.htm&lt;/A&gt;, followed by the responses received:&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;FONT color="#000000"&gt;&lt;STRONG&gt;Q.&lt;/STRONG&gt; I'm abeginner when it comesto caching. Considering a 32-bit address as discussed in this article, an 8KB cache which is 4-way associative and each cache line is 64 bytes, hence 0 - 5 =&amp;gt; offset into 64 byte cache line 6 - 10 =&amp;gt; index into set, total 32 sets 11 - 31 =&amp;gt; must be tags used for a match (cache hit). Please give some information on how exactly a cache&lt;BR /&gt;hit occurs. Thanks. &lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;&lt;STRONG&gt;A.&lt;/STRONG&gt; A cache hit occurring is both a simple and complex operation. It is an automated part of instruction processing by the CPU. Simply put, for memory read operations, if the cacheline containing the memory requested by a machine instruction exists in the cache, then the data is retrieved from the cache. This is a cache hit. Similarly, for write operations, if the cacheline exists in the cache then the memory store is done to the cached version of the memory.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;The actual implementation of a cache is more complex, and varies from processor to processor in its details. For memory read operations, when a machine instruction requires data from memory, it begins a memory fetch, which first checks the address of the needed memory against the tag arrays of the ways. The tag arrays store the upper address bits of the cached line for each line in each way. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Pseudo code for that would look like this:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;// Cache parameters &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define CACHE_SIZE 8*1024 &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define NUM_WAYS 4&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define CACHELINE_SIZE 64&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;#define NUM_SETS (CACHE_SIZE / CACHELINE_SIZE / NUM_WAYS)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;// Think of a cache as an array structured like this:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;struct cache {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; uint32 tag[NUM_SETS][NUM_WAYS];&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;};&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;bool cache_hit(uint32 address) {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; uint32 set = (address / CACHELINE_SIZE) % NUM_SETS;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; uint32 tag_bits = address / NUM_SETS / CACHELINE_SIZE;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; // In hardware, all of the ways are looked up in parallel,&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; // and a hit is determined on a tag-match&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; for (uint32 way = 0; way &amp;lt; NUM_WAYS; way++) {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; if (cache-&amp;gt;tag[set][way] =
= tag_bits) {&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; return true; // The data is in the cache&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; }&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; }&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt; return false; // The data was not found in the cache&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;}&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;If there is a match, then the data is fetched from the matching way. For write operations, the same tag check is done, and if there is a match, the write is done to the cached line. There are a bunch of other details here (such are write-through vs. write-back, and cacheline dirty bits) that vary not only from processor to processor, but by configuration of the cache and memory subsystem. Additionally, certain pages may be marked non-cacheable if, for example, they belong to memory that is modified by devices outside the processor.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;The Wikipedia article is a good general background. &lt;A href="http://en.wikipedia.org/wiki/CPU_cache"&gt;http://en.wikipedia.org/wiki/CPU_cache&lt;/A&gt; (with the caveats that one should never take Wikipedia as a primary source, or fully accurate). Other topics that might be of interest include cacheline eviction policies, different types of caches, cache hierarchy, cache coherence on multicore systems, and the use of prefetch instructions for performance.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;--------&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Gina B.&lt;BR /&gt;Intel Software Network Support&lt;BR /&gt;&lt;/FONT&gt;&lt;A href="http://www.intel.com/software"&gt;&lt;FONT face="Arial" size="2"&gt;&lt;/FONT&gt;&lt;/A&gt;&lt;A href="http://www.intel.com/software" target="_blank"&gt;http://www.intel.com/software&lt;/A&gt;&lt;BR /&gt;&lt;FONT face="Arial" size="2"&gt;email: &lt;/FONT&gt;&lt;A href="mailto:ISN.support@intel.com"&gt;&lt;FONT face="Arial" size="2"&gt;ISN.support@intel.com&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Arial" size="2"&gt;*Other names and brands may be claimed as the property of others.&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jun 2007 23:01:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Q-A-Maximum-FPS-Three-Tips-for-Faster-Code/m-p/851380#M1887</guid>
      <dc:creator>Intel_Software_Netw1</dc:creator>
      <dc:date>2007-06-22T23:01:27Z</dc:date>
    </item>
  </channel>
</rss>

