OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1718 Discussions

Same kernel but huge performance difference under linux and windows

Biao_W_
Beginner
341 Views

Hi, 

I have managed to run my kernel on iGPU under Linux and Windows.

Officially linux does not support to run kernel on iGPU but an OpenCL source project "beignet" come to help.

So following is the performance result for my kernel (deblocking filter in HEVC), the performance (time in seconds) was not obtained by binding event to kernel launching in OpenCL as it also depends on the OpenCL runtime implementation under windows and linux, instead, it was obtained by the host side CPU profiling utilities. 

                      H2D     Kernel     D2H

Linux             1.95,    3.89,        1.56

Windows       6.74,    0.85,        1.44

I am not sure whether the beignet develop team use the same compiler  to the windows OpenCL compiler, but the performance of kernel differs too much under these two systems. Also the host to device copy take much more time on Windows, can not figure out why. 

Any hints? 

my configuration 

hardware:

  • CPU: i5-4570R,  iGPU (HD5200)

OS: Win8.1 

  • iGPU driver version 10.18.10.3960, latest INDE, Visual Studio 2013

Linux :14.04,

  • kernel 3.13
  • Beignet Release v1.0
  • gcc 4.8.3

 

 

 

0 Kudos
1 Reply
Robert_I_Intel
Employee
341 Views

Hi Biao,

Please submit Beignet bugs here: https://bugs.freedesktop.org/enter_bug.cgi?product=Beignet  or direct questions about Beignet performance to the mailing list: http://lists.freedesktop.org/mailman/listinfo/beignet  - we are not supporting it in this forum.

As far as D2H and H2D times, you can avoid these copies by properly aligning the memory of your buffers, using CL_USE_HOST_PTR flag when creating the buffers, and aligning the size of your buffers to 64 bytes. See this excellent article by Adam Lake: https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics on how to do it properly.

0 Kudos
Reply