Solved: Hi Yuri,

Simon_S_1 · ‎01-26-2015

Hello,

I work with the CLI tool of the Intel OpenCL SDK 1.2 on Scientific Linux. I'm interested in optimize my kernels (1) with the oclopt program and (2) with assembly code for CPU or MIC.

Question (1): How I understand the tool oclopt currently: the tool takes a builded spir code and some optimization methods like prefetching or loop-unrolling and produces an optimized version of it. Example:

    oclopt -O3 -prefetch -loop-unroll kernel_x64.spir > kernel_x64.spir

Can I do more with it like giving a hint about the prefetching distance? In my case, these flags does not influence the kernel! Is it possible to access the IMCI instruction set somehow?

Question (2): The ico64 kernel builder creates assembly files. Is there any way to build binary code from assembly code or is it just used for analysing the generated binary kernel file?

Thanks a lot!

Yuri_K_Intel · ‎02-02-2015

Hi Simon, This utility (oclopt) is not intended for direct usage and it might be excluded from future releases actually as well. My understanding is that in order to influence optimization somehow one should use the methods described in OpenCL™ Optimization Guide for HPC Systems (https://software.intel.com/en-us/iocl_tec_2014_opg). There is also no direct way to use IMCI instruction set from OpenCL C code - as for any OpenCL implementation the only allowed language is the one described in OpenCL C specification. Thanks, Yuri

View solution in original post

Robert_I_Intel · ‎01-30-2015

Simon,

1. your understanding is correct: if you build your SPIR kernel firs like thist:

ioc64 -cmd=build -input=drawbox.cl -device=gpu -spir64=drawbox.bc -bo="-cl-std=CL1.2"

Then, you can apply optimizations to it, e.g.

oclopt -strip drawbox.bc > drawbox_stripped.bc

You can check available options via oclopt --help - haven't tried them all, so not sure how to hint about prefetching distance . The kernels are SPIR kernels. Not sure what do you mean by IMCI instruction set.

2. ioc64 could build many things, but if you use -spir64= flag, it will generate the right SPIR file, that you should be able to load with clCreateProgramWithBinary.

Simon_S_1 · ‎01-30-2015

Thanks for your reply!

(1) Please don't refer to GPU. I'm interessted in optimize kernels espezially for the MIC (Xeon Phi) with OpenCL. To optimize low level, it is necessary to influence the optimization somehow. Non of the offered options does show any changes in the generated code. Does the oclopt really apply to the Xeon Phi?

(2) The IMCI is Intels Initial Many Core Instructions set for the Xeon Phi. Is there any assembly optimization with OpenCL possible?

ioc64 works good with spir and clCreateProgramWithBinary. Thats not the problem. I want so know, what else is possibe!

Thanks

Robert_I_Intel · ‎01-30-2015

Simon,

I'll forward your question to Xeon Phi pros.

Yuri_K_Intel · ‎02-02-2015

Hi Simon, This utility (oclopt) is not intended for direct usage and it might be excluded from future releases actually as well. My understanding is that in order to influence optimization somehow one should use the methods described in OpenCL™ Optimization Guide for HPC Systems (https://software.intel.com/en-us/iocl_tec_2014_opg). There is also no direct way to use IMCI instruction set from OpenCL C code - as for any OpenCL implementation the only allowed language is the one described in OpenCL C specification. Thanks, Yuri

Simon_S_1 · ‎02-17-2015

Hello Yuri,

thanks for the reply!

The OpenCL driver of Intel does an vectorization of 16 on Xeon Phi implicitly. Can I turn it off?

Thanks, Simon

Yuri_K_Intel · ‎02-17-2015

Hi Simon, Yes, automatic vectorization can be prevented by using CL_CONFIG_USE_VECTORIZER. Please see https://software.intel.com/en-us/node/540483 Thanks, Yuri

Simon_S_1 · ‎02-17-2015

Hi Yuri,

can I also influence the vectorization for double?

export CL_CONFIG_CPU_VECTORIZER_MODE=8

does not work for me. vectorization of 16 or no vectorization...

Yuri_K_Intel · ‎02-18-2015

Simon, This variable affects code generation for CPU OpenCL device only. Thanks, Yuri

Simon_S_1 · ‎02-21-2015

I can not confirm, that the variable CL_CONFIG_USE_VECTORIZER disables the implizit vectorizer. The output is says that the code is not vectorized, but the assembly code is the same and the execution of the kernel shows no difference.

Kernel optimization with oclopt and ico64