<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Question on Cache-Blocking. in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Question-on-Cache-Blocking/m-p/939806#M5047</link>
    <description>&lt;DIV&gt;Mithum,&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;The purpose for the article was to highlight the shared nature of cache hierarchy in processors that support Hyper-Threading Technology. That with two threads sharing the same cache hierarchy, the effective available cache to each logical processor is reduced. An application that is using cache blocking should detect for processors supportingHyper-Threading technology and reduce the block size appropriately.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;As a generalguideline to start from, I've recommended that cache blocking techniques target ~50% of the cache size for processors without Hyper-Threading technology enabled. If50% was areasons block size without Hyper-Threading, thenrunning the same application but with 2 threads on a Hyper-Threading enabled processor should target ~25-35% of the cache size. The optimizal cache blocking is highly application dependent and significantly influenced by other processes that may be running as well. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;Certaintly, the set-associativity plays a part in both the L2 and L3 cache behavior / performance. There are cases where you can effectively increase (or inadvertantly decrease) the cache performanceby utilizing knowledge of the set-associativity and fine tuning the applications access behavior. Unfortunatley, I don't have any specific data on this.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;By extension, the cache blocking technique can be applied to the L3 instead of the L2 cache but is again application dependent. Beware that applying the cache blocking technique to L3 cache can run into other performance related bottlenecks. For example, the number of entries in the DTLB may also limit the effect size of the block by causing DTLB misses if the block size is too large. While this isn't as likely with an L2 cache size of 512K, it can be an issue with larger cache sizes found in L3 caches. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;I hope this helps.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Regards,&lt;/DIV&gt;
&lt;DIV&gt;Phil Kerly&lt;/DIV&gt;</description>
    <pubDate>Thu, 16 Sep 2004 04:01:51 GMT</pubDate>
    <dc:creator>Philip_K_Intel</dc:creator>
    <dc:date>2004-09-16T04:01:51Z</dc:date>
    <item>
      <title>Question on Cache-Blocking.</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Question-on-Cache-Blocking/m-p/939805#M5046</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;Hello,&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;I have some queries regarding theCache-Blocking method for utilizing data locality. I am refering to the following article.&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;A href="http://www.intel.com/cd/ids/developer/asmo-na/eng/technologies/threading/implementation/20461.htm?page=1" target="_blank"&gt;http://www.intel.com/cd/ids/developer/asmo-na/eng/technologies/threading/implementation/20461.htm?page=1&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;I know thatthe main factors to be taken into consideration are&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;- the L2 processor's cache size.&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;- number of iterations / re-use.&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;- the data block size as compared to the L2 cache size.&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;Question # 1&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;=========&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;However, does anyone know whether the the Sector-Mapping featuresand the Set-Associativityof the L2 cache plays any part ? &lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;Does anyone have any data on this ?&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;Question # 2&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;=========&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;If system has an onboard L3 cache, would it not be advisable to utilize the data localityinside the larger L3 cachethan inside the L2 cache ?&lt;/DIV&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;- Mithun Shanbhag&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;Syracuse University.&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;FONT size="2"&gt;(ps : My system has a 8-way sector-mapped, 512 KB (Unified) L2 cache witha cache line size of 64 bytes. It does not have any L3 cache.)&lt;/FONT&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 10 Sep 2004 08:10:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Question-on-Cache-Blocking/m-p/939805#M5046</guid>
      <dc:creator>mrshanbh</dc:creator>
      <dc:date>2004-09-10T08:10:11Z</dc:date>
    </item>
    <item>
      <title>Re: Question on Cache-Blocking.</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Question-on-Cache-Blocking/m-p/939806#M5047</link>
      <description>&lt;DIV&gt;Mithum,&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;The purpose for the article was to highlight the shared nature of cache hierarchy in processors that support Hyper-Threading Technology. That with two threads sharing the same cache hierarchy, the effective available cache to each logical processor is reduced. An application that is using cache blocking should detect for processors supportingHyper-Threading technology and reduce the block size appropriately.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV&gt;As a generalguideline to start from, I've recommended that cache blocking techniques target ~50% of the cache size for processors without Hyper-Threading technology enabled. If50% was areasons block size without Hyper-Threading, thenrunning the same application but with 2 threads on a Hyper-Threading enabled processor should target ~25-35% of the cache size. The optimizal cache blocking is highly application dependent and significantly influenced by other processes that may be running as well. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;Certaintly, the set-associativity plays a part in both the L2 and L3 cache behavior / performance. There are cases where you can effectively increase (or inadvertantly decrease) the cache performanceby utilizing knowledge of the set-associativity and fine tuning the applications access behavior. Unfortunatley, I don't have any specific data on this.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;By extension, the cache blocking technique can be applied to the L3 instead of the L2 cache but is again application dependent. Beware that applying the cache blocking technique to L3 cache can run into other performance related bottlenecks. For example, the number of entries in the DTLB may also limit the effect size of the block by causing DTLB misses if the block size is too large. While this isn't as likely with an L2 cache size of 512K, it can be an issue with larger cache sizes found in L3 caches. &lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;I hope this helps.&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;Regards,&lt;/DIV&gt;
&lt;DIV&gt;Phil Kerly&lt;/DIV&gt;</description>
      <pubDate>Thu, 16 Sep 2004 04:01:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Question-on-Cache-Blocking/m-p/939806#M5047</guid>
      <dc:creator>Philip_K_Intel</dc:creator>
      <dc:date>2004-09-16T04:01:51Z</dc:date>
    </item>
  </channel>
</rss>

