Software Archive
Read-only legacy content
17061 Discussions

OpenCL on Xeon Phi

David_B_16
Beginner
792 Views

I am having problems compiling some OpenCL examples on the Xeon Phi when I use the CCFLAG option -mmic I get the following build error:

x86_64-k1om-linux-ld: cannot find -lOpenCL

However, when I remove the -mmic option the code builds. So do I need to use the -mmic flag to build OpenCL code that runs efficiently on the Xeon Phi?

Also, is there a webpage that describes how to build and run efficient OpenCL code on the Xeon Phi?

Thanks David

0 Kudos
4 Replies
Yuri_K_Intel
Employee
792 Views
Hi David, The -mmic option creates an application that runs natively on Xeon Phi. So you don't need to specify it when building OpenCL application. The starting point for OpenCL on Xeon Phi is: http://software.intel.com/en-us/vcsource/tools/opencl-sdk-xe Specifically, user guide: http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/UG/index.htm optimization guide: http://software.intel.com/sites/products/documentation/ioclsdk/2013XE/OG/index.htm support forum: http://software.intel.com/en-us/forums/intel-opencl-sdk/ Thanks, Yuri
0 Kudos
Rishab_G_
Beginner
792 Views

Hello ,

I am some problems in vectorizing(float16) the prefix sum kernel using opencl on intel xeon phi .

I am able tow work it out for float data type but the profiling numbers seems petty high.

Please anybody suggest some example for the same.

Regards

Rishab Goel

0 Kudos
TimP
Honored Contributor III
792 Views

KNC doesn't have adequate native support for float16, to my knowledge, so it seems academic to attempt vectorization.   Jim Dempsey posted suggestions for vectorization with native data types. It seems simpler to me to settle for the roughly 50% speedup over the plain sequential implementation which can be obtained with a sort of unroll and jam with the recursion penalty taken only every 4th element.  It's certainly not something which shows KNC in a good light.

0 Kudos
Rishab_G_
Beginner
792 Views

Hello Tim,

How much gain could we get on such a parallel prefix sum calculation kernel according to your experience?

Instead of using float16 could I used for loops instead and rely on compiler to vectorize !!

Regards

Rishab Goel

0 Kudos
Reply