<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Detecting which cores share cache on Quad-core E5440 in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909554#M4576</link>
    <description>&lt;P&gt;I did some measurements on this based on the following assumption:&lt;/P&gt;
&lt;P&gt;&lt;FONT size="4"&gt;&lt;FONT size="5"&gt;[&lt;/FONT&gt; &lt;/FONT&gt;&lt;FONT size="3"&gt;(0, 2), (3, 4) &lt;/FONT&gt;&lt;FONT size="5"&gt;][ &lt;/FONT&gt;&lt;FONT size="3"&gt;(1, 5), (6, 7) &lt;/FONT&gt;&lt;FONT size="5"&gt;]&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Where the brackets [] represent physical processors, the parentheses represent the two cache elements within each physical processor, and the numbers represent the logical processors the OS understands, such that 1 &amp;lt;&amp;lt; N is the processor affinity mask for a given logical processor index N.&lt;/P&gt;
&lt;P&gt;For a highly cache-dependent operation, performed a few thousand times, concurrently on all possiblepairsof logical processors, ie. (0, 2), (0,3), (0,4), ..., (1, 2), (1, 3), (1,4), ..., (2, 3), (2,4), ...., (6,7).&lt;/P&gt;
&lt;P&gt;I get a a performance measurement of ~2 ms per for all combinations residing on separate caches. Being on a different physical processor does not provide any additional benefit. I get a measurement of ~4.5ms for each of four pairs sharing a cache element. &lt;/P&gt;
&lt;P&gt;Mysteriously (?), I get ~3.5ms for each pair which includes LP 0. The pair which shares cache with core 0 actually returns an elevated # of 4.8. I presume this is because some portion of the OS resides permanently on the first core. &lt;/P&gt;
&lt;P&gt;So, on my current processors (E5440) the first two APICs within a physical package use the first cache element in that physical processor. The question, however, still stands as to whether this is behavior I can depend on or not.&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;</description>
    <pubDate>Wed, 02 Apr 2008 19:40:51 GMT</pubDate>
    <dc:creator>ravenous_wolves</dc:creator>
    <dc:date>2008-04-02T19:40:51Z</dc:date>
    <item>
      <title>Detecting which cores share cache on Quad-core E5440</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909551#M4573</link>
      <description>I'm working with a dual-processor system where each processor is (currently) an E5440 quad-core processor. Each processor has 2x6 = 12MB cache. I'm desiring to optimally assign some number of threads across the total 8 cores to minimize cache contention.&lt;BR /&gt;&lt;BR /&gt;I've been referencing the following articles, &lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/articles/detecting-multi-core-processor-topology-in-an-ia-32-platform" target="_blank"&gt;http://software.intel.com/en-us/articles/detecting-multi-core-processor-topology-in-an-ia-32-platform&lt;/A&gt;&lt;BR /&gt;&lt;A href="http://www.devx.com/go-parallel/Article/27398" target="_blank"&gt;http://www.devx.com/go-parallel/Article/27398&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;The first article appears to say the same technique can be used to determine the # of cores per cache. The second article attempts to do that extension, but the code doesn't compile under Visual Studio 2005.&lt;BR /&gt;&lt;BR /&gt;However, even once I know the # of cores per cache I still don't know that I can assume which cores within each processor reference each cache. I was very surprised to see that the windows logical processors do not map onto physical processors in the way I had expected (0-3 = processor 1, 4-7 each processor 2). On my system, I receive the following output:&lt;BR /&gt;&lt;BR /&gt;Relationships between OS affinity mask, Initial APIC ID, and 3-level sub-IDs:&lt;BR /&gt;&lt;BR /&gt; AffinityMask = 1; Initial APIC = 0; Physical ID = 0, Core ID = 0, SMT ID = 0&lt;BR /&gt; AffinityMask = 2; Initial APIC = 4; Physical ID = 4, Core ID = 0, SMT ID = 0&lt;BR /&gt; AffinityMask = 4; Initial APIC = 1; Physical ID = 0, Core ID = 1, SMT ID = 0&lt;BR /&gt; AffinityMask = 8; Initial APIC = 2; Physical ID = 0, Core ID = 2, SMT ID = 0&lt;BR /&gt; AffinityMask = 16; Initial APIC = 3; Physical ID = 0, Core ID = 3, SMT ID = 0&lt;BR /&gt; AffinityMask = 32; Initial APIC = 5; Physical ID = 4, Core ID = 1, SMT ID = 0&lt;BR /&gt; AffinityMask = 64; Initial APIC = 6; Physical ID = 4, Core ID = 2, SMT ID = 0&lt;BR /&gt; AffinityMask = 128; Initial APIC = 7; Physical ID = 4, Core ID = 3, SMT ID =0&lt;BR /&gt;&lt;BR /&gt;If I was running 4 threads, my previous algorithm would have assigned these to logical indices: 0, 2, 4, 6. Three of these would be running on physical processor #1, whereas I had expected to only have 2 running on each physical processor.&lt;BR /&gt;&lt;BR /&gt;So, what assumptions can I make about caches? If I'm able to either modify the Intel code, or get the DevX code to work, and I know the # of cores per L2 cache, can I assume one cache module works with cores 0,1 and the other 2,3; or, does one module work with cores 0,2 and the other 1,3? Or, is it possible that one module works with 0,3 and the other 1,2?&lt;BR /&gt;&lt;BR /&gt;I don't have access to the GetLogicalProcessorInformation function right now, so I'm doing all this through CPUID inline assembler. &lt;FONT face="Times New Roman"&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 01 Apr 2008 17:00:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909551#M4573</guid>
      <dc:creator>ravenous_wolves</dc:creator>
      <dc:date>2008-04-01T17:00:22Z</dc:date>
    </item>
    <item>
      <title>Re: Detecting which cores share cache on Quad-core E5440</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909552#M4574</link>
      <description>&lt;P&gt;The main problem with this is that you cannot expect that the relationship you are seeing stay the same on another computer because if I remember correctly OS assignment varies depending on the BIOS APIC table and OS scheduler logic.&lt;/P&gt;
&lt;P&gt;Perhaps the best way would be to use Initial APIC ID because successive numbers seem to be representing adjacent cores for each physical package. If I understand your numbers correctly you would want to assign threads to the cores with APIC ID 0, 2, 4 and 6.&lt;/P&gt;
&lt;P&gt;Number of cores sharing each cache can be found out by enumerating deterministic cache parameter leaf (CPUID instruction with EAX=4). More information about cache sharing among cores and thread you can find on page 33 section 3.1.5.1 of &lt;A href="http://developer.intel.com/design/processor/applnots/241618.htm" target="_blank" rel="nofollow"&gt;AP-485 Intel Processor Identification and the CPUID Instruction&lt;/A&gt; document order #241618.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Apr 2008 02:29:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909552#M4574</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2008-04-02T02:29:43Z</dc:date>
    </item>
    <item>
      <title>Re: Detecting which cores share cache on Quad-core E5440</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909553#M4575</link>
      <description>I understand that I can't rely on the mapping I quoted above, I'm looking at writing logic to query the processor(s) to obtain this information and then perform thread assignment based on the results. &lt;BR /&gt;&lt;BR /&gt;Given two physical packages each of which with four cores, and two caches, my goal is to assign four threads such that each of them resides on their own cache. &lt;BR /&gt;&lt;BR /&gt;CPUID.4 tells me that there are 2 cores per cache, but I have not found any documentation which says definitively that the first two cores in a physical package use the first cache in that physical package.&lt;BR /&gt;&lt;BR /&gt;From the data above, which cores use the first cache in the first physical package (ID = 0)?&lt;BR /&gt;- APICs 0,1? In my example data above, this would be LP 0,2.&lt;BR /&gt;- APICs 0,2? In my example data above, this would be LP 0,3&lt;BR /&gt;- APICs 0,3? In my example data above, this would be LP 0,4&lt;BR /&gt;- Something else? &lt;BR /&gt;&lt;BR /&gt;Is this known a priori, or is there some method of dynamically determining it?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;</description>
      <pubDate>Wed, 02 Apr 2008 18:00:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909553#M4575</guid>
      <dc:creator>ravenous_wolves</dc:creator>
      <dc:date>2008-04-02T18:00:52Z</dc:date>
    </item>
    <item>
      <title>Re: Detecting which cores share cache on Quad-core E5440</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909554#M4576</link>
      <description>&lt;P&gt;I did some measurements on this based on the following assumption:&lt;/P&gt;
&lt;P&gt;&lt;FONT size="4"&gt;&lt;FONT size="5"&gt;[&lt;/FONT&gt; &lt;/FONT&gt;&lt;FONT size="3"&gt;(0, 2), (3, 4) &lt;/FONT&gt;&lt;FONT size="5"&gt;][ &lt;/FONT&gt;&lt;FONT size="3"&gt;(1, 5), (6, 7) &lt;/FONT&gt;&lt;FONT size="5"&gt;]&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Where the brackets [] represent physical processors, the parentheses represent the two cache elements within each physical processor, and the numbers represent the logical processors the OS understands, such that 1 &amp;lt;&amp;lt; N is the processor affinity mask for a given logical processor index N.&lt;/P&gt;
&lt;P&gt;For a highly cache-dependent operation, performed a few thousand times, concurrently on all possiblepairsof logical processors, ie. (0, 2), (0,3), (0,4), ..., (1, 2), (1, 3), (1,4), ..., (2, 3), (2,4), ...., (6,7).&lt;/P&gt;
&lt;P&gt;I get a a performance measurement of ~2 ms per for all combinations residing on separate caches. Being on a different physical processor does not provide any additional benefit. I get a measurement of ~4.5ms for each of four pairs sharing a cache element. &lt;/P&gt;
&lt;P&gt;Mysteriously (?), I get ~3.5ms for each pair which includes LP 0. The pair which shares cache with core 0 actually returns an elevated # of 4.8. I presume this is because some portion of the OS resides permanently on the first core. &lt;/P&gt;
&lt;P&gt;So, on my current processors (E5440) the first two APICs within a physical package use the first cache element in that physical processor. The question, however, still stands as to whether this is behavior I can depend on or not.&lt;/P&gt;
&lt;P&gt;Cheers,&lt;/P&gt;</description>
      <pubDate>Wed, 02 Apr 2008 19:40:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909554#M4576</guid>
      <dc:creator>ravenous_wolves</dc:creator>
      <dc:date>2008-04-02T19:40:51Z</dc:date>
    </item>
    <item>
      <title>Re: Detecting which cores share cache on Quad-core E5440</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909555#M4577</link>
      <description>&lt;P&gt;For your information, we're working on an update of the white paper on processor topology enumeration and the associated reference code. I expect them to be ready in the June time frame.&lt;/P&gt;
&lt;P&gt;The update is expected to include enhancement in several areas:&lt;/P&gt;
&lt;P&gt;1. System topology enumeration using x2APIC ID where available. Enumeration using initial APIC ID will also be supported when x2APIC ID is not available. &lt;/P&gt;
&lt;P&gt;2. Reference code for cache topology enumeration will also be included along with CPU topology.&lt;/P&gt;
&lt;P&gt;The cache topology enumeration is based on those published in the Intel 64 Architecture Software Optimization Manual.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2008 02:03:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909555#M4577</guid>
      <dc:creator>SHIH_K_Intel</dc:creator>
      <dc:date>2008-04-03T02:03:17Z</dc:date>
    </item>
    <item>
      <title>Re: Detecting which cores share cache on Quad-core E5440</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909556#M4578</link>
      <description>&lt;P&gt;Apparently there is a CacheIndex encoded into the APIC_ID. There is psuedo-code showing how to extract this in:&lt;/P&gt;&lt;FONT face="Times New Roman" color="#0000ff" size="2"&gt;
&lt;P align="left"&gt;Refer to section 7.10.3 of the &lt;/P&gt;&lt;/FONT&gt;&lt;I&gt;&lt;FONT face="Times New Roman" color="#0000ff" size="2"&gt;Intel@ 64 and IA-32 Software Developers Manual,&lt;P&gt;&lt;/P&gt;
&lt;P align="left"&gt;Volume 3A: System Programming Guide&lt;/P&gt;&lt;/FONT&gt;&lt;/I&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT face="Times New Roman" color="#0000ff" size="2"&gt;.&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;
&lt;P align="left"&gt;&lt;FONT face="Times New Roman" color="#0000ff" size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P align="left"&gt;&lt;FONT face="Times New Roman" color="#0000ff" size="2"&gt;The logic I ended up using looks like this:&lt;/FONT&gt;&lt;/P&gt;&lt;FONT face="Times New Roman" color="#0000ff" size="2"&gt;&lt;FONT size="2"&gt;
&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" size="2"&gt;int&lt;/FONT&gt;&lt;FONT size="2"&gt; nL2CacheIDMaskWidth = find_maskwidth(nLogicalProcessorsPerL2Cache_supported);&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" size="2"&gt;char&lt;/FONT&gt;&lt;FONT size="2"&gt; nL2CacheIDMask = (&lt;/FONT&gt;&lt;FONT color="#0000ff" size="2"&gt;char&lt;/FONT&gt;&lt;FONT size="2"&gt;) (0xFF &amp;lt;&amp;lt; nL2CacheIDMaskWidth); &lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;FONT size="2"&gt;&lt;FONT size="2"&gt;
&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" size="2"&gt;int&lt;/FONT&gt;&lt;FONT size="2"&gt; nL2CacheIndex = ((nAPIC_ID &amp;amp; nL2CacheIDMask) &amp;gt;&amp;gt; nL2CacheIDMaskWidth);&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;</description>
      <pubDate>Mon, 07 Apr 2008 19:54:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Detecting-which-cores-share-cache-on-Quad-core-E5440/m-p/909556#M4578</guid>
      <dc:creator>ravenous_wolves</dc:creator>
      <dc:date>2008-04-07T19:54:26Z</dc:date>
    </item>
  </channel>
</rss>

