<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Dr. McCalpin, in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054718#M6841</link>
    <description>&lt;P&gt;Dr. McCalpin,&lt;/P&gt;

&lt;P&gt;Thank you for the guidance. This is very helpful. I will plan to revisit my code snippet with the modifications you've suggested.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
	Patrick L.&lt;/P&gt;</description>
    <pubDate>Tue, 05 May 2015 21:56:32 GMT</pubDate>
    <dc:creator>Patrick_L_</dc:creator>
    <dc:date>2015-05-05T21:56:32Z</dc:date>
    <item>
      <title>PCM reporting lower than expected memory read counts</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054716#M6839</link>
      <description>&lt;P&gt;I have a piece of code on which I'm running PCM (Performance Counter Monitor). It is essentially the following:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;uint64_t *a,*b;
a = new uint64_t[LEN];
b = new uint64_t[LEN];
for( int i=0;i&amp;lt;LEN;i++ ) a&lt;I&gt; = b&lt;I&gt;;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;With LEN set to 402,653,184 (384 Mi), PCM is reporting 0.72 GB under READ and 6.30 GB under WRITE. Given that each array is 3 GiB, I would expect that both arrays would be read (since processor uses write-allocate), giving a READ of about 6 GiB. I would expect array "a" to be written back, giving a write count of 3 GiB.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Does anyone know why the read count is so low, and the write count is higher than expected?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Processor is Intel Core i7 940 (Nehalem).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Any help is appreciated.&lt;/P&gt;

&lt;P&gt;Patrick&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2015 17:43:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054716#M6839</guid>
      <dc:creator>Patrick_L_</dc:creator>
      <dc:date>2015-05-05T17:43:17Z</dc:date>
    </item>
    <item>
      <title>Make sure that the arrays are</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054717#M6840</link>
      <description>&lt;P&gt;Make sure that the arrays are instantiated before you use them.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Linux handles un-instantiated addresses differently than other Unix systems.&amp;nbsp; If you read an address that has not previously been written to, the OS will map the address to a "zero page" that is filled with zeros.&amp;nbsp; That "zero page" will be cache-contained, so the reads of b&lt;I&gt; will mostly come from the cache.&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;The writes to a&lt;I&gt; will force those pages to be instantiated, and eventually written back from the cache.&amp;nbsp; However, the code that instantiates the pages is complex and obscure, and it is very hard to understand the performance counts obtained from that code path.&amp;nbsp; It is certainly possible that the code could write the pages of a&lt;I&gt; to memory (e.g. zeroing the page using streaming store) before they are read back in by the code&amp;nbsp; -- this would account for the doubled write traffic.&amp;nbsp;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;To minimize the confusion due to complex and obscure OS code, I recommend including an "initialization loop" that fill both arrays with something, then a repeated "copy loop" that copies the arrays back and forth.&amp;nbsp;&amp;nbsp; I sometimes set up two versions of the code -- one that copies the arrays 20 times and one that copies the arrays 10 times.&amp;nbsp; Taking the difference between the counts should remove most of the confusing overhead and leave you with the memory traffic associated with the 10 extra array copies.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2015 20:45:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054717#M6840</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-05-05T20:45:04Z</dc:date>
    </item>
    <item>
      <title>Dr. McCalpin,</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054718#M6841</link>
      <description>&lt;P&gt;Dr. McCalpin,&lt;/P&gt;

&lt;P&gt;Thank you for the guidance. This is very helpful. I will plan to revisit my code snippet with the modifications you've suggested.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
	Patrick L.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2015 21:56:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/PCM-reporting-lower-than-expected-memory-read-counts/m-p/1054718#M6841</guid>
      <dc:creator>Patrick_L_</dc:creator>
      <dc:date>2015-05-05T21:56:32Z</dc:date>
    </item>
  </channel>
</rss>

