<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi, all: in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948965#M1970</link>
    <description>&lt;P&gt;Hi, all:&lt;/P&gt;

&lt;P&gt;Answers are highly appreciated.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 07 Jan 2014 22:29:22 GMT</pubDate>
    <dc:creator>Biao_W_</dc:creator>
    <dc:date>2014-01-07T22:29:22Z</dc:date>
    <item>
      <title>Huge difference in memory bandwidth for IVB</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948964#M1969</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I used memory bandwidth benchmark in AMD APP 2.9 SDK and test the memory bandwidth in IVB HD 4000, the original benchmark only test the data type of "float" while I extend it to int, short and unsigned char, all benchmark using scalar data type, no vector type is used. I observed&amp;nbsp;a huge difference of memory bandwidth between 32-bit memory access and none 32-bit memory access, following are my&amp;nbsp;results:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Global Memory Read: Single (all threads read from a single memory location) GB/s in Ivy bridge HD 4000&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Float &amp;nbsp; &amp;nbsp; Int &amp;nbsp; &amp;nbsp; &amp;nbsp;Short &amp;nbsp; uchar&amp;nbsp;&lt;/P&gt;

&lt;P&gt;8.79 &amp;nbsp; &amp;nbsp;&amp;nbsp;8.79 &amp;nbsp; &amp;nbsp;&amp;nbsp;0.60 &amp;nbsp; &amp;nbsp; 0.60&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Comparatively, I run the same benchmark on Haswell i4770S HD4600 graphics and the bandwidth gap shrinks significantly.&lt;/P&gt;

&lt;P&gt;Float &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;Int &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Short &amp;nbsp; &amp;nbsp;uchar&amp;nbsp;&lt;/P&gt;

&lt;P&gt;56.05&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;57.60&amp;nbsp; &amp;nbsp; &amp;nbsp;33.90&amp;nbsp; &amp;nbsp; 31.91&lt;/P&gt;

&lt;P&gt;The testing kernel code is presented as follows, where the DATATYPE can be defined during the online compilation:&lt;/P&gt;

&lt;P&gt;__kernel void read_single(__global DATATYPE *input,__global DATATYPE *output)&lt;BR /&gt;
	{&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;DATATYPE val = (DATATYPE)(0.0f);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;uint gid = get_global_id(0);&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[0];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[1];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[2];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[3];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[4];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[5];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[6];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[7];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[8];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[9];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[10];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[11];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[12];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[13];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[14];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[15];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[16];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[17];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[18];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[19];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[20];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[21];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[22];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[23];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[24];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[25];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[26];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[27];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[28];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[29];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[30];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;val = val + input[31];&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;output[gid] = val;&lt;BR /&gt;
	}&lt;/P&gt;

&lt;P&gt;The question are,&lt;/P&gt;

&lt;P&gt;1. why are there so much memory bandwidth&amp;nbsp;difference between 32 bit and none 32bit data type in &amp;nbsp;IVB?&lt;/P&gt;

&lt;P&gt;2. why haswell doing much better when access none 32 bit data types?&lt;/P&gt;

&lt;P&gt;Best&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2014 23:32:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948964#M1969</guid>
      <dc:creator>Biao_W_</dc:creator>
      <dc:date>2014-01-03T23:32:29Z</dc:date>
    </item>
    <item>
      <title>Hi, all:</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948965#M1970</link>
      <description>&lt;P&gt;Hi, all:&lt;/P&gt;

&lt;P&gt;Answers are highly appreciated.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2014 22:29:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948965#M1970</guid>
      <dc:creator>Biao_W_</dc:creator>
      <dc:date>2014-01-07T22:29:22Z</dc:date>
    </item>
    <item>
      <title>Hi Biao,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948966#M1971</link>
      <description>&lt;P&gt;Hi Biao,&lt;/P&gt;

&lt;P&gt;Haswell has several memory access performance improvements for these data types. That's why you see the huge BW performance difference going from IVB to HSW. So what you are seeing is expected.&lt;/P&gt;

&lt;P&gt;Raghu&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jan 2014 23:16:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948966#M1971</guid>
      <dc:creator>Raghupathi_M_Intel</dc:creator>
      <dc:date>2014-01-09T23:16:37Z</dc:date>
    </item>
    <item>
      <title>Quote:Raghu Muthyalampalli</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948967#M1972</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Raghu Muthyalampalli (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi Biao,&lt;/P&gt;

&lt;P&gt;Haswell has several memory access performance improvements for these data types. That's why you see the huge BW performance difference going from IVB to HSW. So what you are seeing is expected.&lt;/P&gt;

&lt;P&gt;Raghu&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi, Raghu:&lt;/P&gt;

&lt;P&gt;Thanks for your reply.&lt;/P&gt;

&lt;P&gt;However, could you provide more detail about the data type improvements on HSW? Or give me some links to related documents?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jan 2014 23:36:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Huge-difference-in-memory-bandwidth-for-IVB/m-p/948967#M1972</guid>
      <dc:creator>Biao_W_</dc:creator>
      <dc:date>2014-01-09T23:36:36Z</dc:date>
    </item>
  </channel>
</rss>

