<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Compiler always tries to in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940975#M1784</link>
    <description>&lt;P&gt;Compiler always tries to vectorize to the widest SIMD it can. Typically 16, so 16 work-items will be packed in the 16-wide SIMD lock-steps.&lt;/P&gt;
&lt;P&gt;Thus, if your kernel operates on float8, the code will be executed in 8*SIMD16. From the kernel perspective the orignal vector expressions are executed in the transposed (by means of ArrayOfStructures-to-StructureOfArray transformation which is more SIMD friendly) fashion. Notice that local size of 16 is a bare minimum which gives the compiler a room for this trick.&lt;/P&gt;</description>
    <pubDate>Mon, 30 Sep 2013 12:03:00 GMT</pubDate>
    <dc:creator>Maxim_S_Intel</dc:creator>
    <dc:date>2013-09-30T12:03:00Z</dc:date>
    <item>
      <title>Hardware Thread / Work-group / Work-Intem relation HD graphics</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940970#M1779</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I have a doubt about the mapping of work-items with Hardware threads, in my understanding, each work-item is mapped to one hardware threads, but when I read the Optimization guide I found this Note :&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Work-group size of&amp;nbsp;16&amp;nbsp;work-items is enough if you do not ask for SLM. Then each work-group maps to each hardware thread.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;in this case a work-group will be mapped to a hardware thread, now I can assume that all computations on the kernel are scalar, my question is: if I use vector operations, is this mapping still correct ? if yes , how this can be done (I guess the compiler scallarize all vector opérations) ?&lt;/P&gt;
&lt;P&gt;I'm using OpenCL for intel HD Graphics not CPU.&lt;/P&gt;
&lt;P&gt;Thanks in advance,&lt;/P&gt;
&lt;P&gt;Mohamed&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Sep 2013 13:45:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940970#M1779</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-27T13:45:29Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940971#M1780</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;the computations within a GPU thread are not scalar, but SIMDified (typically to the width of 16).&lt;/P&gt;
&lt;P&gt;You can some find details on the threads/SIMD in the recent&amp;nbsp;preso: &lt;A href="http://software.intel.com/sites/default/files/Faster-Better-Pixels-on-the-Go-and-in-the-Cloud-with-OpenCL-on-Intel-Architecture.pdf"&gt;http://software.intel.com/sites/default/files/Faster-Better-Pixels-on-the-Go-and-in-the-Cloud-with-OpenCL-on-Intel-Architecture.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 Sep 2013 15:52:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940971#M1780</guid>
      <dc:creator>Maxim_S_Intel</dc:creator>
      <dc:date>2013-09-27T15:52:52Z</dc:date>
    </item>
    <item>
      <title>Hi Maxim</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940972#M1781</link>
      <description>&lt;P&gt;Hi Maxim&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Maxim Shevtsov (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;the computations within a GPU thread are not scalar, but SIMDified (typically to the width of 16).&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;What is a GPU thread ? is it a channel or a Hardware Thread ?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2013 07:42:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940972#M1781</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-30T07:42:33Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940973#M1782</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;GPU threads are threads that run on Execution Units(EUs) of the Intel HD Graphics. Multiple threads&amp;nbsp;can be scheduled on an EU (for example up to 8 threads in the prev. generation of Intel GPUs) to prevent the EU from sitting idle (say due to latency of the mem request). GPU threads are lightweight and HW-scheduled.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2013 10:27:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940973#M1782</guid>
      <dc:creator>Maxim_S_Intel</dc:creator>
      <dc:date>2013-09-30T10:27:01Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940974#M1783</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Ah ok, The remaining question is if I define in my kernel, a computation like this : float8 a,b,c; a=11;b=12;c=2; res=mad(a,b,c);&lt;/P&gt;
&lt;P&gt;and I will set the Local_size=16 and Global_Size=64;&amp;nbsp;is this MAD operation will be executed as 16 x SIMD8 operation (because it is a hard coded vectorial operation) ? I don't know if hard coding the vectorial operations, will be suited for HD graphics or not ?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2013 10:38:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940974#M1783</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-30T10:38:46Z</dc:date>
    </item>
    <item>
      <title>Compiler always tries to</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940975#M1784</link>
      <description>&lt;P&gt;Compiler always tries to vectorize to the widest SIMD it can. Typically 16, so 16 work-items will be packed in the 16-wide SIMD lock-steps.&lt;/P&gt;
&lt;P&gt;Thus, if your kernel operates on float8, the code will be executed in 8*SIMD16. From the kernel perspective the orignal vector expressions are executed in the transposed (by means of ArrayOfStructures-to-StructureOfArray transformation which is more SIMD friendly) fashion. Notice that local size of 16 is a bare minimum which gives the compiler a room for this trick.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2013 12:03:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940975#M1784</guid>
      <dc:creator>Maxim_S_Intel</dc:creator>
      <dc:date>2013-09-30T12:03:00Z</dc:date>
    </item>
    <item>
      <title>Thank you so much for this</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940976#M1785</link>
      <description>&lt;P&gt;Thank you so much for this explanation, now I have clearer idea about how my kernel is executed.&lt;/P&gt;
&lt;P&gt;Mohamed&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2013 12:48:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Hardware-Thread-Work-group-Work-Intem-relation-HD-graphics/m-p/940976#M1785</guid>
      <dc:creator>Mohamed_Amine_BERGAC</dc:creator>
      <dc:date>2013-09-30T12:48:55Z</dc:date>
    </item>
  </channel>
</rss>

