- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I work with the CLI tool of the Intel OpenCL SDK 1.2 on Scientific Linux. I'm interested in optimize my kernels (1) with the oclopt program and (2) with assembly code for CPU or MIC.
Question (1): How I understand the tool oclopt currently: the tool takes a builded spir code and some optimization methods like prefetching or loop-unrolling and produces an optimized version of it. Example:
oclopt -O3 -prefetch -loop-unroll kernel_x64.spir > kernel_x64.spir
Can I do more with it like giving a hint about the prefetching distance? In my case, these flags does not influence the kernel! Is it possible to access the IMCI instruction set somehow?
Question (2): The ico64 kernel builder creates assembly files. Is there any way to build binary code from assembly code or is it just used for analysing the generated binary kernel file?
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Simon,
1. your understanding is correct: if you build your SPIR kernel firs like thist:
ioc64 -cmd=build -input=drawbox.cl -device=gpu -spir64=drawbox.bc -bo="-cl-std=CL1.2"
Then, you can apply optimizations to it, e.g.
oclopt -strip drawbox.bc > drawbox_stripped.bc
You can check available options via oclopt --help - haven't tried them all, so not sure how to hint about prefetching distance . The kernels are SPIR kernels. Not sure what do you mean by IMCI instruction set.
2. ioc64 could build many things, but if you use -spir64= flag, it will generate the right SPIR file, that you should be able to load with clCreateProgramWithBinary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply!
(1) Please don't refer to GPU. I'm interessted in optimize kernels espezially for the MIC (Xeon Phi) with OpenCL. To optimize low level, it is necessary to influence the optimization somehow. Non of the offered options does show any changes in the generated code. Does the oclopt really apply to the Xeon Phi?
(2) The IMCI is Intels Initial Many Core Instructions set for the Xeon Phi. Is there any assembly optimization with OpenCL possible?
ioc64 works good with spir and clCreateProgramWithBinary. Thats not the problem. I want so know, what else is possibe!
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Simon,
I'll forward your question to Xeon Phi pros.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Yuri,
thanks for the reply!
The OpenCL driver of Intel does an vectorization of 16 on Xeon Phi implicitly. Can I turn it off?
Thanks, Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yuri,
can I also influence the vectorization for double?
export CL_CONFIG_CPU_VECTORIZER_MODE=8
does not work for me. vectorization of 16 or no vectorization...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can not confirm, that the variable CL_CONFIG_USE_VECTORIZER disables the implizit vectorizer. The output is says that the code is not vectorized, but the assembly code is the same and the execution of the kernel shows no difference.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page