OpenCL on Xeon Phi

David_B_16 · ‎03-27-2013

I am having problems compiling some OpenCL examples on the Xeon Phi when I use the CCFLAG option -mmic I get the following build error:

x86_64-k1om-linux-ld: cannot find -lOpenCL

However, when I remove the -mmic option the code builds. So do I need to use the -mmic flag to build OpenCL code that runs efficiently on the Xeon Phi?

Also, is there a webpage that describes how to build and run efficient OpenCL code on the Xeon Phi?

Thanks David

Yuri_K_Intel · ‎03-27-2013

Hi David, The -mmic option creates an application that runs natively on Xeon Phi. So you don't need to specify it when building OpenCL application. The starting point for OpenCL on Xeon Phi is: http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe Specifically, user guide: http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/UG/index.htm optimization guide: http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/index.htm support forum: http://software.intel.com/en-us/forums/intel-opencl-sdk/ Thanks, Yuri

Rishab_G_ · ‎04-08-2016

Hello ,

I am some problems in vectorizing(float16) the prefix sum kernel using opencl on intel xeon phi .

I am able tow work it out for float data type but the profiling numbers seems petty high.

Please anybody suggest some example for the same.

Regards

Rishab Goel

TimP · ‎04-08-2016

KNC doesn't have adequate native support for float16, to my knowledge, so it seems academic to attempt vectorization. Jim Dempsey posted suggestions for vectorization with native data types. It seems simpler to me to settle for the roughly 50% speedup over the plain sequential implementation which can be obtained with a sort of unroll and jam with the recursion penalty taken only every 4th element. It's certainly not something which shows KNC in a good light.

Rishab_G_ · ‎04-09-2016

Hello Tim,

How much gain could we get on such a parallel prefix sum calculation kernel according to your experience?

Instead of using float16 could I used for loops instead and rely on compiler to vectorize !!

Regards

Rishab Goel