- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link to source
http://pastebin.com/FyZkMrvQ
Used Intel® software was OpenCL CPU driver opencl_runtime_15.1_x64_5.0.0.57 from https://software.intel.com/en-us/articles/opencl-drivers#lin64
Compare Beignet (GPU, id 0) vs Intel® proprietary driver (CPU, id 1) vs pocl (CPU, id 2)
user@host:~/.dev/OpenCL$ gcc perftest.c -std=c11 -O2 -lOpenCL -o perftest
user@host:~/.dev/OpenCL$ for id in 0 1 2; do time ./perftest $id; done
Succeeded to create a device group!
Device: 0
Name: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
Vendor: Intel
Available: Yes
Compute Units: 20
Clock Frequency: 1000 mHz
Global Memory: 2048 mb
Max Allocateable Memory: 1024 mb
Local Memory: 65536 kb
Succeeded to create a compute context!
Succeeded to create a command commands!
Succeeded to create compute program!
Succeeded to create program executable!
Succeeded to create compute kernel!
real 0m25.741s
user 0m0.604s
sys 0m17.796s
Succeeded to create a device group!
Device: 1
Name: Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
Vendor: Intel(R) Corporation
Available: Yes
Compute Units: 4
Clock Frequency: 1600 mHz
Global Memory: 5664 mb
Max Allocateable Memory: 1416 mb
Local Memory: 32768 kb
Succeeded to create a compute context!
Succeeded to create a command commands!
Succeeded to create compute program!
Succeeded to create program executable!
Succeeded to create compute kernel!
real 0m50.082s
user 1m21.951s
sys 0m40.065s
Succeeded to create a device group!
Device: 2
Name: pthread-Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
Vendor: GenuineIntel
Available: Yes
Compute Units: 4
Clock Frequency: 2600 mHz
Global Memory: 5664 mb
Max Allocateable Memory: 5664 mb
Local Memory: 1643847680 kb
Succeeded to create a compute context!
Succeeded to create a command commands!
Succeeded to create compute program!
Succeeded to create program executable!
Succeeded to create compute kernel!
real 0m28.620s
user 0m49.843s
sys 0m4.252s
My clinfo output: http://pastebin.com/30jkBzzs
Looks strange - open source library pocl (http://portablecl.org) beats official Intel® software in such simple test case (don't look at "Clock Frequency" reported - when loaded it runs at 2300 MHz in both cases). If it isn't bug in my system - maybe it will be better for Intel® to support pocl (which still has a lot of problem with standards support and stability) in stead of development own driver?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ilia,
You are measuring total execution time of the program that has a number of issues:
1. You are allocating and deallocating buffers in a loop, which is highly undesirable. Recommendation is typically to do buffer allocations outside of the loop
2. You are allocating buffers the wrong way for our platforms: you need to use CL_USE_HOST_PTR flag, create arrays with aligned_alloc with 4096 byte alignment and size your buffers in multiples of 64 bytes.
3. You shouldn't use clEnqueueReadBuffer and clEnqueueWriteBuffer: use clEnqueueMapBuffer, which should result in no copies to/from the device and almost instant execution
Please check this article on how to do performance measurements for OpenCL https://software.intel.com/en-us/articles/intel-sdk-for-opencl-applications-performance-debugging-intro and this article on how to allocate "zero-copy" buffers https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics
Bottom line: you should be measuring kernel performance. What you are measing is program build time, buffer allocation/deallocation, and copying data back and forth and a little bit of kernel performance.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page