<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic a specified computation like in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933207#M1640</link>
    <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;a specified computation like a histogram for example that fits in __private memory of a thread&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Each&amp;nbsp;GPU&amp;nbsp;thread doesn't consume much, but you might have hundreds of threads in the fly, so that private arrays might not fit in the registers' space. In general, the perfomance depends on the amount of private mem the kernel requires.&lt;/P&gt;</description>
    <pubDate>Mon, 23 Sep 2013 14:27:38 GMT</pubDate>
    <dc:creator>Maxim_S_Intel</dc:creator>
    <dc:date>2013-09-23T14:27:38Z</dc:date>
    <item>
      <title>Private memory Intel HD graphics</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933204#M1637</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;
&lt;P&gt;I'm wondering how the private_memory of Intel HD graphics works, I read from the "Intel® SDK for OpenCL* Applications 2013 - Optimization Guide for Windows* OS" this recommandation about the private memory :&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Since each work-item has its own&amp;nbsp;__private&amp;nbsp;memory, there is no locality for&amp;nbsp;__private&amp;nbsp;memory accesses, and each work-item frequently accesses a unique cache line for every access to&amp;nbsp;__private&amp;nbsp;memory. For this reason, accesses to&amp;nbsp;__private&amp;nbsp;memory are very slow, and you should avoid indexed private memory if possible.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;But I don't really understand why I have to avoid the indexed private memory ? can any one tell me more about this or just explain this recommendation ?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Wed, 18 Sep 2013 15:14:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933204#M1637</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-18T15:14:31Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933205#M1638</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Since __private memory is indeed "private" (not shared within workitems), each workitem might consume quite some mem space. And you have many work-items in the fly: considering numbers of execution units of the GPU (also each unit hosting multiple threads), SIMDification in the threads, etc. Up to almost 9000 of work-items potentially in the flight (refer to the &lt;A href="http://software.intel.com/sites/default/files/Faster-Better-Pixels-on-the-Go-and-in-the-Cloud-with-OpenCL-on-Intel-Architecture.pdf"&gt;preso &lt;/A&gt;for the details).&amp;nbsp;&amp;nbsp;So say you request just 32 floats of the private mem in the kernel, this would total up to&amp;nbsp;~1MB. This wouldn't fit even L1, not mentioning the GPU register space. So in a worst case the perfomance of the __private mem will be similar to the perfomance of the __global.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2013 13:17:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933205#M1638</guid>
      <dc:creator>Maxim_S_Intel</dc:creator>
      <dc:date>2013-09-23T13:17:04Z</dc:date>
    </item>
    <item>
      <title>Hi Maxim,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933206#M1639</link>
      <description>&lt;P&gt;Hi Maxim,&lt;/P&gt;
&lt;P&gt;I believe that using __private memory hurts performance for the reason that you explained, but what I'm don't really understand is that with a specified computation like a histogram for example that fits in __private memory of a thread (I assume in the general register file) why this computation will run much slower (or the same) than if we put data on other memory spaces, it’s more obvious for me that registers are faster than other types of memory? Am I missing something?&lt;/P&gt;
&lt;P&gt;Thank you very much for your answer.&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2013 14:12:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933206#M1639</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-23T14:12:48Z</dc:date>
    </item>
    <item>
      <title>a specified computation like</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933207#M1640</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;a specified computation like a histogram for example that fits in __private memory of a thread&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Each&amp;nbsp;GPU&amp;nbsp;thread doesn't consume much, but you might have hundreds of threads in the fly, so that private arrays might not fit in the registers' space. In general, the perfomance depends on the amount of private mem the kernel requires.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2013 14:27:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933207#M1640</guid>
      <dc:creator>Maxim_S_Intel</dc:creator>
      <dc:date>2013-09-23T14:27:38Z</dc:date>
    </item>
    <item>
      <title>if I understood there is no</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933208#M1641</link>
      <description>&lt;P&gt;if I understood there is no limitation about (__private memory) if we took the right amount of data that fits in registers of all the threads on the fly. this makes sens for me.&lt;BR /&gt;if we restrict the number of threads on the fly, we can have more registers, and we can do more complex computations on bigger chunks of data "in registers". I think I have a bug in this conclusion but I don't know where? :)&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Mon, 23 Sep 2013 14:51:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933208#M1641</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-23T14:51:28Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933209#M1642</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Maxim Shevtsov (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;(refer to the &lt;A href="http://software.intel.com/sites/default/files/Faster-Better-Pixels-on-the-Go-and-in-the-Cloud-with-OpenCL-on-Intel-Architecture.pdf"&gt;preso &lt;/A&gt;for the details).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;in this preso (slide number 42), __private memory are allocated in global Memory, in which case this can happen?&lt;/P&gt;
&lt;P&gt;Thx&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Fri, 11 Oct 2013 08:21:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933209#M1642</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-10-11T08:21:50Z</dc:date>
    </item>
    <item>
      <title>Quote:Mohamed Amine BERGACH</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933210#M1643</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Mohamed Amine BERGACH wrote:&lt;BR /&gt;&amp;nbsp; in this preso (slide number 42), __private memory are allocated in global Memory, in which case this can happen?&lt;/BLOCKQUOTE&gt;&amp;nbsp;&amp;nbsp;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;As we speculated in this thread, this eventually would happen if the requested private mem (remeber that each work-item requires it's own copy) doesn't fit the register space&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2013 13:02:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933210#M1643</guid>
      <dc:creator>Maxim_S_Intel</dc:creator>
      <dc:date>2013-10-14T13:02:51Z</dc:date>
    </item>
    <item>
      <title>Ok, if the size of data used</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933211#M1644</link>
      <description>&lt;P&gt;Ok, if the size of data used by each work item fits in the register space, is this suffice to give me the garanty that all data will be processed in registers ? if yes, in this case indexed __private memory will be still not recommended ?&lt;/P&gt;
&lt;P&gt;Thx,&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2013 13:43:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933211#M1644</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-10-14T13:43:18Z</dc:date>
    </item>
    <item>
      <title>I'm asking this question</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933212#M1645</link>
      <description>&lt;P&gt;I'm asking this question because when I use clGetKernelWorkGroupInfo (.....CL_KERNEL_PRIVATE_MEM_SIZE...) I get all the time 0, but when I use more than 256 Byte I get more understandable values ? is this means that when I allocate more than 256 Byte my kernel will spill?&lt;/P&gt;
&lt;P&gt;Thx,&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Mon, 14 Oct 2013 14:23:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Private-memory-Intel-HD-graphics/m-p/933212#M1645</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-10-14T14:23:37Z</dc:date>
    </item>
  </channel>
</rss>

