OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Xeon Phi, Opencl and clbenchmark

MSimm2
New Contributor I
283 Views

http://clbenchmark.com/device-info.jsp?config=15887974

*Cough*...Not so impressive, but I'm guessing its beta version problem...(?) Anyway something looks broken.

If you go to the results page http://clbenchmark.com/result.jsp and untick GPU you can see that it just beats a i7-3770K

0 Kudos
3 Replies
LLess
Beginner
283 Views

I hope so too because compared with a GeForce GTX Titan it really hurts...

ARNON_P_Intel
Employee
283 Views

In respect to these benchmark results, as you know, OpenCL provides a low-level programming environment to write portable code for diverse mix of platforms and devices. The standard ensures that this portable code will be functionally correct on different devices. However, performance portability is not guarantee. Specifically, OpenCL code designed for one target device will not necessarily be optimized to run on another type of target device without optimizing that code for the underlying hardware. Performance and efficiency improvements resulting from this kind of optimization effort may be significant for multicore and many-core applications. And that might be the case here.

In their article “Demonstrating Performance Portability of a Custom OpenCL Data Mining Application to the Intel Xeon Phi Coprocessor”, A. Heinecke et al. showcase how developer can generate optimal code with only slight modifications for each target device on the fly. See at: http://iwocl.org/wp-content/uploads/2013/06/Dmitry.pdf. Thier results comparing OpenCL on Xeon Phi verous other devices, and refer you to the code itself.

Arnon

MSimm2
New Contributor I
283 Views

Thanks Arnon, 

If I optimise my code (http://sourceforge.net/projects/openclsolarsyst) to a haswell CPU (with AVX2) and use a workgroup size of 16 , will this go most of the way to optimising for the Xeon phi?

Other than a lookup table using the device name, is there someway of detecting optimisations for the Xeon phi. Eg  CL_DEVICE_LOCAL_MEM_SIZE reports 32768 And CL_DEVICE_MAX_WORK_GROUP_SIZE reports 1024. 
The paper you linked suggest not using local memory and making the work groups small.

perhaps we need 

 CL_DEVICE_PREFERRED_WORK_GROUP_SIZE 16

and

CL_DEVICE_PREFERRED_LOCAL_MEM_SIZE 0




Reply