OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Kernel optimization with oclopt and ico64

Simon_S_1
Beginner
864 Views

Hello,

I work with the CLI tool of the Intel OpenCL SDK 1.2 on Scientific Linux. I'm interested in optimize my kernels (1) with the oclopt program and (2) with assembly code for CPU or MIC.

Question (1): How I understand the tool oclopt currently: the tool takes a builded spir code and some optimization methods like prefetching or loop-unrolling and produces an optimized version of it. Example:

    oclopt -O3 -prefetch -loop-unroll kernel_x64.spir > kernel_x64.spir

Can I do more with it like giving a hint about the prefetching distance? In my case, these flags does not influence the kernel! Is it possible to access the IMCI instruction set somehow?

Question (2): The ico64 kernel builder creates assembly files. Is there any way to build binary code from assembly code or is it just used for analysing the generated binary kernel file?

Thanks a lot!

0 Kudos
1 Solution
Yuri_K_Intel
Employee
864 Views
Hi Simon, This utility (oclopt) is not intended for direct usage and it might be excluded from future releases actually as well. My understanding is that in order to influence optimization somehow one should use the methods described in OpenCL™ Optimization Guide for HPC Systems (https://software.intel.com/en-us/iocl_tec_2014_opg). There is also no direct way to use IMCI instruction set from OpenCL C code - as for any OpenCL implementation the only allowed language is the one described in OpenCL C specification. Thanks, Yuri

View solution in original post

0 Kudos
9 Replies
Robert_I_Intel
Employee
864 Views

Simon,

1. your understanding is correct: if you build your SPIR kernel firs like thist:

ioc64 -cmd=build -input=drawbox.cl -device=gpu -spir64=drawbox.bc -bo="-cl-std=CL1.2"

Then, you can apply optimizations to it, e.g.

oclopt -strip drawbox.bc > drawbox_stripped.bc

You can check available options via oclopt --help - haven't tried them all, so not sure how to hint about prefetching distance . The kernels are SPIR kernels. Not sure what do you mean by IMCI instruction set.

 

2. ioc64 could build many things, but if you use -spir64= flag, it will generate the right SPIR file, that you should be able to load with clCreateProgramWithBinary.

0 Kudos
Simon_S_1
Beginner
864 Views

Thanks for your reply!

(1) Please don't refer to GPU. I'm interessted in optimize kernels espezially for the MIC (Xeon Phi) with OpenCL. To optimize low level, it is necessary to influence the optimization somehow. Non of the offered options does show any changes in the generated code. Does the oclopt really apply to the Xeon Phi?

(2) The IMCI is Intels Initial Many Core Instructions set for the Xeon Phi. Is there any assembly optimization with OpenCL possible?

ioc64 works good with spir and clCreateProgramWithBinary. Thats not the problem. I want so know, what else is possibe!

Thanks

0 Kudos
Robert_I_Intel
Employee
864 Views

Simon,

I'll forward your question to Xeon Phi pros. 

0 Kudos
Yuri_K_Intel
Employee
865 Views
Hi Simon, This utility (oclopt) is not intended for direct usage and it might be excluded from future releases actually as well. My understanding is that in order to influence optimization somehow one should use the methods described in OpenCL™ Optimization Guide for HPC Systems (https://software.intel.com/en-us/iocl_tec_2014_opg). There is also no direct way to use IMCI instruction set from OpenCL C code - as for any OpenCL implementation the only allowed language is the one described in OpenCL C specification. Thanks, Yuri
0 Kudos
Simon_S_1
Beginner
864 Views

Hello Yuri,

thanks for the reply!

The OpenCL driver of Intel does an vectorization of 16 on Xeon Phi implicitly. Can I turn it off?

Thanks, Simon

0 Kudos
Yuri_K_Intel
Employee
864 Views
Hi Simon, Yes, automatic vectorization can be prevented by using CL_CONFIG_USE_VECTORIZER. Please see https://software.intel.com/en-us/node/540483 Thanks, Yuri
0 Kudos
Simon_S_1
Beginner
864 Views

Hi Yuri,

can I also influence the vectorization for double?

export CL_CONFIG_CPU_VECTORIZER_MODE=8

does not work for me. vectorization of 16 or no vectorization...

0 Kudos
Yuri_K_Intel
Employee
864 Views
Simon, This variable affects code generation for CPU OpenCL device only. Thanks, Yuri
0 Kudos
Simon_S_1
Beginner
864 Views

I can not confirm, that the variable CL_CONFIG_USE_VECTORIZER disables the implizit vectorizer. The output is says that the code is not vectorized, but the assembly code is the same and the execution of the kernel shows no difference.

0 Kudos
Reply