<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Big kernel performance difference between the image created from HOST_PTR and the image created from Buffer Object in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090623#M4884</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;We found&amp;nbsp;the same kernel performance varies dramatically if the input image is created from different ways. With the attached test tool:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;if the input image is created from a host ptr directly, the performance is good, e.g. for 8K x 8K input image:
		&lt;UL&gt;
			&lt;LI&gt;./blockread&lt;/LI&gt;
			&lt;LI&gt;Average kernel 2.033509 ms&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
	&lt;LI&gt;if the input image is created from a buffer object (which is created from the same host ptr), the performance drops much: for the same 8K x 8K process:
		&lt;UL&gt;
			&lt;LI&gt;./blockread -b&lt;/LI&gt;
			&lt;LI&gt;Average kernel 3.763424 ms&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;The buffer pitch/base address are aligned at 4K, not sure why the performance difference is so big...&lt;/P&gt;

&lt;P&gt;The code snippet for image creation&amp;nbsp;is listed bellow&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (create_image_from_buf) {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buf_from_hostptr = clCreateBuffer(context, CL_MEM_READ_WRITE| CL_MEM_USE_HOST_PTR, src_size, src_ptr, &amp;amp;errNum);&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (buf_from_hostptr == 0) {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("clCreateBuffer failed \n");&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; exit(1);&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; desc.buffer = buf_from_hostptr;&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // flags inherited from buffer&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; img_from_buf = clCreateImage(context,0, &amp;amp;format, &amp;amp;desc,NULL,&amp;amp;errNum);&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (img_from_buf == 0) {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("clCreateImage failed \n");&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; exit(1);&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp; } else {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; img_from_hostptr = clCreateImage(context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, &amp;amp;format, &amp;amp;desc, src_ptr, &amp;amp;errNum);&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (img_from_hostptr == NULL)&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cerr &amp;lt;&amp;lt; "Error creating memory objects." &amp;lt;&amp;lt; std::endl;&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return false;&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;-Austin&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 18 Nov 2016 07:18:12 GMT</pubDate>
    <dc:creator>Shengquan_Y_Intel</dc:creator>
    <dc:date>2016-11-18T07:18:12Z</dc:date>
    <item>
      <title>Big kernel performance difference between the image created from HOST_PTR and the image created from Buffer Object</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090623#M4884</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;We found&amp;nbsp;the same kernel performance varies dramatically if the input image is created from different ways. With the attached test tool:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;if the input image is created from a host ptr directly, the performance is good, e.g. for 8K x 8K input image:
		&lt;UL&gt;
			&lt;LI&gt;./blockread&lt;/LI&gt;
			&lt;LI&gt;Average kernel 2.033509 ms&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
	&lt;LI&gt;if the input image is created from a buffer object (which is created from the same host ptr), the performance drops much: for the same 8K x 8K process:
		&lt;UL&gt;
			&lt;LI&gt;./blockread -b&lt;/LI&gt;
			&lt;LI&gt;Average kernel 3.763424 ms&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;The buffer pitch/base address are aligned at 4K, not sure why the performance difference is so big...&lt;/P&gt;

&lt;P&gt;The code snippet for image creation&amp;nbsp;is listed bellow&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (create_image_from_buf) {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; buf_from_hostptr = clCreateBuffer(context, CL_MEM_READ_WRITE| CL_MEM_USE_HOST_PTR, src_size, src_ptr, &amp;amp;errNum);&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (buf_from_hostptr == 0) {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("clCreateBuffer failed \n");&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; exit(1);&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; desc.buffer = buf_from_hostptr;&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // flags inherited from buffer&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; img_from_buf = clCreateImage(context,0, &amp;amp;format, &amp;amp;desc,NULL,&amp;amp;errNum);&lt;/P&gt;

	&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (img_from_buf == 0) {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("clCreateImage failed \n");&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; exit(1);&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp; } else {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; img_from_hostptr = clCreateImage(context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, &amp;amp;format, &amp;amp;desc, src_ptr, &amp;amp;errNum);&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (img_from_hostptr == NULL)&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cerr &amp;lt;&amp;lt; "Error creating memory objects." &amp;lt;&amp;lt; std::endl;&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return false;&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;
		&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;-Austin&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Nov 2016 07:18:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090623#M4884</guid>
      <dc:creator>Shengquan_Y_Intel</dc:creator>
      <dc:date>2016-11-18T07:18:12Z</dc:date>
    </item>
    <item>
      <title>If you create an image</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090624#M4885</link>
      <description>&lt;P&gt;If you create an image directly from a host buffer pointer it isn't zero copy.&amp;nbsp; During initialization the data is copied into tile format by the driver&amp;nbsp;to better match the HW design.&amp;nbsp; There is some overhead, but it is a one-time cost.&lt;/P&gt;

&lt;P&gt;As your test shows, this one time copy overhead can often be less expensive overall than linear access.&amp;nbsp;&amp;nbsp;Data access&amp;nbsp;remains linear&amp;nbsp;when you skip the&amp;nbsp;data layout update (copy)&amp;nbsp;by doing clCreateBuffer first then create an image directly using that buffer.&amp;nbsp; Here the image data is still linear, like the original buffer,&amp;nbsp;which is less efficient at each access.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2016 00:01:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090624#M4885</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2016-11-19T00:01:37Z</dc:date>
    </item>
    <item>
      <title>Hi, Jerrfey,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090625#M4886</link>
      <description>&lt;P&gt;Hi, Jerrfey,&lt;/P&gt;

&lt;P&gt;One more question about the pitch alignment requirement for clCreateImage. It looks clCreateImage from a buffer object has more restrictions on the pitch alignment.&amp;nbsp; From clinfo, the pitch alignement is 4 bytes:&lt;/P&gt;

&lt;P&gt;-----------------------------------------------------------------------&lt;/P&gt;

&lt;P&gt;&amp;nbsp; Image support&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Yes&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; Base address alignment for 2D image buffers&amp;nbsp;&amp;nbsp; 4 bytes&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; Pitch alignment for 2D image buffers&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4 bytes&lt;/P&gt;

&lt;P&gt;-----------------------------------------------------------------------&lt;/P&gt;

&lt;P&gt;The real situation is (use 4x4 as the example, here the pitch is 4 byte):&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;if the input image is created from a host ptr directly,&amp;nbsp; clCreateImage is successful&lt;/LI&gt;
	&lt;LI&gt;if the input image is created from a buffer object (which is created from the same host ptr), clCreateImage&amp;nbsp; will fail and error number is (-39)
		&lt;UL&gt;
			&lt;LI&gt;-39 is&amp;nbsp;CL_INVALID_IMAGE_FORMAT_DESCRIPTOR (from spec: if a 2D image is created from a buffer and the row pitch and base address alignment does not follow the rules described for creating a 2D image from a buffer&lt;/LI&gt;
		&lt;/UL&gt;
	&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;How to explain this phenomena?&amp;nbsp; What's the pitch alignment requirement for 2?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Jeffrey M. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;As your test shows, this one time copy overhead can often be less expensive overall than linear access.&amp;nbsp;&amp;nbsp;Data access&amp;nbsp;remains linear&amp;nbsp;when you skip the&amp;nbsp;data layout update (copy)&amp;nbsp;by doing clCreateBuffer first then create an image directly using that buffer.&amp;nbsp; Here the image data is still linear, like the original buffer,&amp;nbsp;which is less efficient at each access.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Nov 2016 09:52:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090625#M4886</guid>
      <dc:creator>Shengquan_Y_Intel</dc:creator>
      <dc:date>2016-11-21T09:52:24Z</dc:date>
    </item>
    <item>
      <title>Thanks, very convincing</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090626#M4887</link>
      <description>&lt;P&gt;Thanks, very convincing explanation&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;For "clCreateBuffer-&amp;gt;clCreateImage", you mentioned the difference is because of "skip the data layout update", do you mean if I follow "clCreateBuffer-&amp;gt;cl&lt;SPAN class="highlight end selected"&gt;EnqueueWrite/ReadBuffer-&amp;gt;clCreateImage2D", it will have the same performance behavior as "clCreateImage from HOST_PTR"?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Jeffrey M. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;skip the&amp;nbsp;data layout update (copy)&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;-Austin&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Nov 2016 10:01:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090626#M4887</guid>
      <dc:creator>Shengquan_Y_Intel</dc:creator>
      <dc:date>2016-11-21T10:01:51Z</dc:date>
    </item>
    <item>
      <title>The behavior you're seeing is</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090627#M4888</link>
      <description>&lt;P&gt;The behavior you're seeing is also related to the driver implementation. &amp;nbsp;When you create an image with a copy the driver can help with alignment and padding while it is converting your data to tiled layout. &amp;nbsp;This approach can have better performance and fewer restrictions.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However, when you skip the copy all of the rules in the spec must be enforced. &amp;nbsp;This is why you see an invalid format descriptor error with your "-b" case for the same parameters allowed by the first scenario.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Nov 2016 01:16:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090627#M4888</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2016-11-23T01:16:52Z</dc:date>
    </item>
    <item>
      <title>Jeffrey, is there a way to</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090628#M4889</link>
      <description>&lt;P&gt;Jeffrey, is there a way to enable copy/tile format for "clCreateBuffer-&amp;gt;clCreateImage"?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Jeffrey M. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;The behavior you're seeing is also related to the driver implementation. &amp;nbsp;When you create an image with a copy the driver can help with alignment and padding while it is converting your data to tiled layout. &amp;nbsp;This approach can have better performance and fewer restrictions.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However, when you skip the copy all of the rules in the spec must be enforced. &amp;nbsp;This is why you see an invalid format descriptor error with your "-b" case for the same parameters allowed by the first scenario.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Nov 2016 03:13:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090628#M4889</guid>
      <dc:creator>Shengquan_Y_Intel</dc:creator>
      <dc:date>2016-11-23T03:13:46Z</dc:date>
    </item>
    <item>
      <title>When you initialize the image</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090629#M4890</link>
      <description>&lt;P&gt;When you initialize the image this way it forces the image data to remain just like it is in the buffer. &amp;nbsp;You are specifying zero copy. &amp;nbsp;If this is what you want, use the clCreatBuffer-&amp;gt;clCreateImage path. &amp;nbsp;If you want copy/tile just use clCreateImage.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Nov 2016 06:16:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090629#M4890</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2016-11-23T06:16:31Z</dc:date>
    </item>
    <item>
      <title>Note, you may also find</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090630#M4891</link>
      <description>&lt;P&gt;Note, you may also find clEnqueueCopyBufferToImage() to be useful, if you want to explicitly copy data from a buffer memory object to an already existing image memory object.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueCopyBufferToImage.html" target="_blank"&gt;https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueCopyBufferToImage.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Nov 2016 18:43:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Big-kernel-performance-difference-between-the-image-created-from/m-p/1090630#M4891</guid>
      <dc:creator>Ben_A_Intel</dc:creator>
      <dc:date>2016-11-23T18:43:29Z</dc:date>
    </item>
  </channel>
</rss>

