<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic GEN instruction explanation? in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072542#M4486</link>
    <description>&lt;P&gt;I'm storing 8x64-bit quad-words (SIMD8) to SLM and am trying to understand some curious GEN sequences.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;The OpenCL line of code in question is a store to a doubly indexed array in SLM:&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;shared.m0[2][local_id] = r1;&lt;/PRE&gt;

&lt;P&gt;Why does this indexed store to SLM result in 4-6 "mov" operations and two sends?&lt;/P&gt;

&lt;P&gt;I assume some MOV operations are necessary to prepare a SEND "message"?&lt;/P&gt;

&lt;P&gt;But why are there two SEND ops?&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;send     (8|M0)         null:ud       r27:ud            0xA       0x40F0020 //  hdc.dc0  wr:2h, rd:0, wr.scrdwfc: 0x70020
send     (8|M0)         null:ud       r59:ud            0xC       0x6026CFE //  hdc.dc1  wr:3, rd:0, wr.usurf msc:44, to SLM
&lt;/PRE&gt;

&lt;P&gt;I understand the second SEND but what is the first doing that's necessary? &amp;nbsp;Is it a queue barrier of some sort?&lt;/P&gt;

&lt;P&gt;Also, why are there so many MOV operations for this 8x64-bit SIMD8 store?&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="gen_store.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8991i21FE840A5FEBD047/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="gen_store.png" alt="gen_store.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 26 Jul 2016 21:53:56 GMT</pubDate>
    <dc:creator>allanmac1</dc:creator>
    <dc:date>2016-07-26T21:53:56Z</dc:date>
    <item>
      <title>GEN instruction explanation?</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072542#M4486</link>
      <description>&lt;P&gt;I'm storing 8x64-bit quad-words (SIMD8) to SLM and am trying to understand some curious GEN sequences.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;The OpenCL line of code in question is a store to a doubly indexed array in SLM:&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;shared.m0[2][local_id] = r1;&lt;/PRE&gt;

&lt;P&gt;Why does this indexed store to SLM result in 4-6 "mov" operations and two sends?&lt;/P&gt;

&lt;P&gt;I assume some MOV operations are necessary to prepare a SEND "message"?&lt;/P&gt;

&lt;P&gt;But why are there two SEND ops?&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;send     (8|M0)         null:ud       r27:ud            0xA       0x40F0020 //  hdc.dc0  wr:2h, rd:0, wr.scrdwfc: 0x70020
send     (8|M0)         null:ud       r59:ud            0xC       0x6026CFE //  hdc.dc1  wr:3, rd:0, wr.usurf msc:44, to SLM
&lt;/PRE&gt;

&lt;P&gt;I understand the second SEND but what is the first doing that's necessary? &amp;nbsp;Is it a queue barrier of some sort?&lt;/P&gt;

&lt;P&gt;Also, why are there so many MOV operations for this 8x64-bit SIMD8 store?&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="gen_store.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8991i21FE840A5FEBD047/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="gen_store.png" alt="gen_store.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2016 21:53:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072542#M4486</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2016-07-26T21:53:56Z</dc:date>
    </item>
    <item>
      <title>I took a look at the code</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072543#M4487</link>
      <description>&lt;P&gt;I took a look at the code generation for SIMD8 x 32-bit global and local loads and stores and it looks nice and compact with typically an ADD (pointer increment), MOV and SEND.&lt;/P&gt;

&lt;P&gt;Is this just code generation that needs to improve or would it be beneficial to load/store the low and high 32-bit words of a 64-bit word?&lt;/P&gt;

&lt;P&gt;I was assuming that only one SEND operation would be generated for an SIMD8 x 64-bit load/store (64 bytes/clock).&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jul 2016 00:49:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072543#M4487</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2016-07-28T00:49:00Z</dc:date>
    </item>
    <item>
      <title>The first send looks to be a</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072544#M4488</link>
      <description>&lt;P&gt;The first send looks to be a scratch DWORD write. This typically happens on a spill (out of registers) or when one accesses a private array with a dynamic index&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp;ptr&lt;I&gt; = ...; // where i is a variable (not a constant)&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;Can you query the value clGetKernelWorkGroupInfo(&lt;SPAN style="color: rgb(0, 0, 0); font-family: &amp;quot;courier new&amp;quot;, monospace; font-size: 16px; line-height: normal;"&gt;CL_KERNEL_PRIVATE_MEM_SIZE&lt;/SPAN&gt;)?&lt;/P&gt;

&lt;P&gt;Can you maybe show us the CL code? Or a small reproducer for that code?&lt;/P&gt;

&lt;P&gt;Regards,&lt;/P&gt;

&lt;P&gt;- Tim&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 00:26:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072544#M4488</guid>
      <dc:creator>Timothy_B_Intel</dc:creator>
      <dc:date>2016-08-05T00:26:16Z</dc:date>
    </item>
    <item>
      <title>Thanks... I took a look and</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072545#M4489</link>
      <description>&lt;P&gt;Thanks, I just took a look and private memory is reported to be 0 whether building a binary or compiling from kernel source at runtime:&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;kernel info:
    Maximum work-group size: 256
    Compiler work-group size: (0, 0, 0)
    Local memory size: 32704
    Preferred multiple of work-group size: 8
    Minimum amount of private memory: 0&lt;/PRE&gt;

&lt;P&gt;I'll keep digging and simplifying to see if I can squash this bug.&lt;/P&gt;

&lt;P&gt;I'll send a reproducer if I don't see an improvement.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 18:01:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072545#M4489</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2016-08-05T18:01:00Z</dc:date>
    </item>
    <item>
      <title>Sounds great. Let me know how</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072546#M4490</link>
      <description>&lt;P&gt;Sounds great. Let me know how it works out.&lt;/P&gt;

&lt;P&gt;Can you show us more of the kernel (OpenCL or assembly)? Specifically, I am interested in the structure type for your local memory and how you access it (more than just that line). There are some other cases where we can "spill", but we can almost always tweak the GPU program to fix that.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 16:52:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/GEN-instruction-explanation/m-p/1072546#M4490</guid>
      <dc:creator>Timothy_B_Intel</dc:creator>
      <dc:date>2016-08-10T16:52:53Z</dc:date>
    </item>
  </channel>
</rss>

