<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Dmittry! thanks!! It works in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968010#M2340</link>
    <description>&lt;P&gt;Dmittry! thanks!! It works now. Have been digging into the problem and reading the link you sent and followed the intel sample. The host and accelerator results are okay now! Using ClEnqueueMapBuffer and ClEnqueueUnMapBuffer corretly fixed the problem. I know understand the point you were trying to make about using those commands for synchronization (or shared memory authority b/w host and accelerator). Also, the Phi now seems slightly better (by a narrow margin) after running a few more benchmarks with larger problem sizes. I will look into the workgorup sizes now (local work group set to 16 as per phi optimization guide) to see if any more performance gain could be achieved.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks buddy! :)&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 04 Apr 2014 03:33:39 GMT</pubDate>
    <dc:creator>Dave_O_</dc:creator>
    <dc:date>2014-04-04T03:33:39Z</dc:date>
    <item>
      <title>Opencl: incorrect results on phi</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968002#M2332</link>
      <description>&lt;P&gt;This kernel runs on host cpu but produces wrong outputs when run Phi. What's the issue witht the Phi here?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;//KERNEL_SIMPLE&lt;BR /&gt;
	__kernel void Convolve(const __global &amp;nbsp;float * pInput,&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; __constant float * pFilter,&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; __global &amp;nbsp;float * pOutput,&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int nInWidth,&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int nFilterWidth)&lt;BR /&gt;
	{&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; const int nWidth = get_global_size(0);&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; const int xOut = get_global_id(0);&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; const int yOut = get_global_id(1);&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; const int xInTopLeft = xOut;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; const int yInTopLeft = yOut;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; float sum = 0;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; for (int r = 0; r &amp;lt; nFilterWidth; r++)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; {&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int idxFtmp = r * nFilterWidth;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int yIn = yInTopLeft + r;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int idxIntmp = yIn * nInWidth + xInTopLeft;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; for (int c = 0; c &amp;lt; nFilterWidth; c++)&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int idxF &amp;nbsp;= idxFtmp &amp;nbsp;+ c;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const int idxIn = idxIntmp + c;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; sum += pFilter[idxF]*pInput[idxIn];&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; }&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; const int idxOut = yOut * nWidth + xOut;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; pOutput[idxOut] = sum;&lt;BR /&gt;
	}&lt;BR /&gt;
	//KERNEL_SIMPLE&lt;/P&gt;</description>
      <pubDate>Tue, 01 Apr 2014 21:04:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968002#M2332</guid>
      <dc:creator>Dave_O_</dc:creator>
      <dc:date>2014-04-01T21:04:45Z</dc:date>
    </item>
    <item>
      <title>Hi Dave,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968003#M2333</link>
      <description>Hi Dave,

Could you please provide a full reproducer (including the host part)? And the short description of expected results. So I will be able to check it quickly.

Thanks,
Yuri</description>
      <pubDate>Wed, 02 Apr 2014 06:51:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968003#M2333</guid>
      <dc:creator>Yuri_K_Intel</dc:creator>
      <dc:date>2014-04-02T06:51:06Z</dc:date>
    </item>
    <item>
      <title>Thanks. I sent you a private</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968004#M2334</link>
      <description>&lt;P&gt;Thanks. I sent you a private message with the source including makefiles. Anyone from intel could please explain to me why the performance of the xeon phi is much less than that of the phi.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;also I forgot. when you run the program, use --help to see commands.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2014 01:40:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968004#M2334</guid>
      <dc:creator>David_O_</dc:creator>
      <dc:date>2014-04-03T01:40:46Z</dc:date>
    </item>
    <item>
      <title>*correction: ...than the of</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968005#M2335</link>
      <description>&lt;P&gt;*correction: ...than that of the *host*&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2014 04:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968005#M2335</guid>
      <dc:creator>Dave_O_</dc:creator>
      <dc:date>2014-04-03T04:14:00Z</dc:date>
    </item>
    <item>
      <title>Well, the "correctness"</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968006#M2336</link>
      <description>Well, the "correctness" problem is in the host part of the code. The output buffer (oclBuffers.outputCL) is created with CL_MEM_USE_HOST_PTR flag. And in case of CPU OpenCL implementation uses exactly the same memory region (specified by hostBuffers.pOutputCL) - so we have results in this buffer right away. But in case of Xeon Phi it is not so - OpenCL implementation allocates another buffer on the device and we should get the data back to host. In this case (when CL_MEM_USE_HOST_PTR flag is used) it's sufficient to call clEnqueueMapBuffer and clEnqueueUnmapMemObject functions for the output buffer. With this modification I get "Passed" on Xeon Phi. Please, look at some samples for the similar code, for example - &lt;A href="http://software.intel.com/en-us/vcsource/samples/hdr-tone-mapping" target="_blank"&gt;http://software.intel.com/en-us/vcsource/samples/hdr-tone-mapping&lt;/A&gt;.

As for the performance question... This is quite a broad topic and each application should be analyzed/tuned separately to achieve maximum performance. Here is just general comment. Xeon Phi is an accelerator device (like a GPU) and there are a number of factors (workload algorithm, device architecture, the working size, etc) which determine if the acceleration is possible or not. Please, use optimization guide &lt;A href="http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/index.htm" target="_blank"&gt;http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/index.htm&lt;/A&gt;.

Thanks,
Yuri</description>
      <pubDate>Thu, 03 Apr 2014 11:52:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968006#M2336</guid>
      <dc:creator>Yuri_K_Intel</dc:creator>
      <dc:date>2014-04-03T11:52:24Z</dc:date>
    </item>
    <item>
      <title>According to OpenCL spec</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968007#M2337</link>
      <description>&lt;P&gt;According to OpenCL spec paragraph 5.4.2:&lt;/P&gt;
&lt;P align="left"&gt;&lt;EM&gt;&lt;B&gt;&lt;FONT face="TimesNewRomanPS-BoldMT"&gt;clEnqueueMapBuffer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT face="TimesNewRomanPSMT"&gt;, and &lt;/FONT&gt;&lt;B&gt;&lt;FONT face="TimesNewRomanPS-BoldMT"&gt;clEnqueueMapImage &lt;/FONT&gt;&lt;/B&gt;&lt;FONT face="TimesNewRomanPSMT"&gt;act as synchronization points for a region of &lt;/FONT&gt;&lt;FONT face="TimesNewRomanPSMT"&gt;the buffer object being mapped.&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P align="left"&gt;&lt;FONT face="TimesNewRomanPSMT"&gt;This means that actual data from device is transferred to the host only during Map operations and transfered back during Unmap. The actual data transfer direction depends on clEnqueueMapXXXX parameters. If buffer or image is currently mapped it is considered owned by host and any access to this buffer or image on device produces undefined results. The same is valid for unmapped buffers or images - if buffer or image is unmapped it is considered owned by device and any access to&amp;nbsp;apropriate memory region from the host results in undefined behavior.&lt;/FONT&gt;&lt;/P&gt;
&lt;P align="left"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2014 12:13:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968007#M2337</guid>
      <dc:creator>Dmitry_K_Intel</dc:creator>
      <dc:date>2014-04-03T12:13:06Z</dc:date>
    </item>
    <item>
      <title>Yuri, thanks. I will re-check</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968008#M2338</link>
      <description>&lt;P&gt;Yuri, thanks. I will re-check the link. For the meantime, what is the Number of Compute Units used by opencl on xeon phi? 60 or 240? That is, how does opencl deal with the 4-way hyperthreading on the device?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2014 14:42:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968008#M2338</guid>
      <dc:creator>Dave_O_</dc:creator>
      <dc:date>2014-04-03T14:42:10Z</dc:date>
    </item>
    <item>
      <title>I have read the documentation</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968009#M2339</link>
      <description>&lt;P&gt;I have read the documentation again. Map and unmap has better performance than read and write.&lt;/P&gt;

&lt;P&gt;Do you agree with the following&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;// Intialization:&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;cldistances = clCreateBuffer(clcontext, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, ...&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;...&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;// Calculation loop:&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;for (i = 0; i &amp;lt; numsteps;...&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;{&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;// Enqueing kernel for execution:&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;clerror = clEnqueueNDRangeKernel(clqueue, cldistkernel,...&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;// Enqueing buffer mapping to read result from device&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;clEnqueueMapBuffer(clqueue, cldistances,...&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;...&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;SPAN style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);"&gt;}&lt;/SPAN&gt;&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	&lt;BR style="color: rgb(51, 51, 51); font-family: Verdana, Arial, Tahoma, Calibri, Geneva, sans-serif; font-size: 13px; line-height: normal; background-color: rgb(250, 250, 250);" /&gt;
	and so at what point do I need the clEnqueueUnMapBuffer?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Dave&lt;/P&gt;</description>
      <pubDate>Fri, 04 Apr 2014 01:39:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968009#M2339</guid>
      <dc:creator>Dave_O_</dc:creator>
      <dc:date>2014-04-04T01:39:57Z</dc:date>
    </item>
    <item>
      <title>Dmittry! thanks!! It works</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968010#M2340</link>
      <description>&lt;P&gt;Dmittry! thanks!! It works now. Have been digging into the problem and reading the link you sent and followed the intel sample. The host and accelerator results are okay now! Using ClEnqueueMapBuffer and ClEnqueueUnMapBuffer corretly fixed the problem. I know understand the point you were trying to make about using those commands for synchronization (or shared memory authority b/w host and accelerator). Also, the Phi now seems slightly better (by a narrow margin) after running a few more benchmarks with larger problem sizes. I will look into the workgorup sizes now (local work group set to 16 as per phi optimization guide) to see if any more performance gain could be achieved.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks buddy! :)&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Apr 2014 03:33:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Opencl-incorrect-results-on-phi/m-p/968010#M2340</guid>
      <dc:creator>Dave_O_</dc:creator>
      <dc:date>2014-04-04T03:33:39Z</dc:date>
    </item>
  </channel>
</rss>

