- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hello,
I work with the CLI tool of the Intel OpenCL SDK 1.2 on Scientific Linux. I'm interested in optimize my kernels (1) with the oclopt program and (2) with assembly code for CPU or MIC.
Question (1): How I understand the tool oclopt currently: the tool takes a builded spir code and some optimization methods like prefetching or loop-unrolling and produces an optimized version of it. Example:
oclopt -O3 -prefetch -loop-unroll kernel_x64.spir > kernel_x64.spir
Can I do more with it like giving a hint about the prefetching distance? In my case, these flags does not influence the kernel! Is it possible to access the IMCI instruction set somehow?
Question (2): The ico64 kernel builder creates assembly files. Is there any way to build binary code from assembly code or is it just used for analysing the generated binary kernel file?
Thanks a lot!
- Marcas:
- OpenCL*
- Professors
- Students
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Link copiado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Simon,
1. your understanding is correct: if you build your SPIR kernel firs like thist:
ioc64 -cmd=build -input=drawbox.cl -device=gpu -spir64=drawbox.bc -bo="-cl-std=CL1.2"
Then, you can apply optimizations to it, e.g.
oclopt -strip drawbox.bc > drawbox_stripped.bc
You can check available options via oclopt --help - haven't tried them all, so not sure how to hint about prefetching distance . The kernels are SPIR kernels. Not sure what do you mean by IMCI instruction set.
2. ioc64 could build many things, but if you use -spir64= flag, it will generate the right SPIR file, that you should be able to load with clCreateProgramWithBinary.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thanks for your reply!
(1) Please don't refer to GPU. I'm interessted in optimize kernels espezially for the MIC (Xeon Phi) with OpenCL. To optimize low level, it is necessary to influence the optimization somehow. Non of the offered options does show any changes in the generated code. Does the oclopt really apply to the Xeon Phi?
(2) The IMCI is Intels Initial Many Core Instructions set for the Xeon Phi. Is there any assembly optimization with OpenCL possible?
ioc64 works good with spir and clCreateProgramWithBinary. Thats not the problem. I want so know, what else is possibe!
Thanks
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Simon,
I'll forward your question to Xeon Phi pros.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hello Yuri,
thanks for the reply!
The OpenCL driver of Intel does an vectorization of 16 on Xeon Phi implicitly. Can I turn it off?
Thanks, Simon
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi Yuri,
can I also influence the vectorization for double?
export CL_CONFIG_CPU_VECTORIZER_MODE=8
does not work for me. vectorization of 16 or no vectorization...
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I can not confirm, that the variable CL_CONFIG_USE_VECTORIZER disables the implizit vectorizer. The output is says that the code is not vectorized, but the assembly code is the same and the execution of the kernel shows no difference.

- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora