OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Is clBuildProgram needed in conjunction with clCreateProgramFromBinary?

Logan_J_
Beginner
538 Views

Hi,

While trying to develop a standalone for a prior question, I noticed that offline compilation seems to behave differently for CPU and GPU. Per the OpenCL spec, my understanding is that I should be able to reuse compiled kernels (either through ioc32/64 or clCreateProgramFromSource/clBuild). When using a GPU device I can load said precompiled kernel through clCreateProgramFromBinary and be ready to use it. CPU, however, requires me to call clBuild yet again, which from a performance standpoint defeats the purpose of precompiling my kernels.

I've attached a MSVC 2012 project to reproduce what I'm seeing. Under the release directory are some precompiled kernels that I generated using ioc32. The executable explains how to use it upon running it with no commands. The only thing it doesn't mention is it checks the extension to determine if the input file is a binary file or not. If the file doesn't end in .bin, it assumes its a text .cl file.

Summary:

offlineCompileBug.exe CPU Template.cl 0 - Fails (Expected)
offlineCompileBug.exe CPU Template.cl 1 - Succeeds (Expected)
offlineCompileBug.exe GPU Template.gpu.bin 0 - Succeeds (Expected)
offlineCompileBug.exe GPU Template.gpu.bin 1 - Succeeds (Not expected, why does compiling twice work?)
offlineCompileBug.exe CPU Template.cpu.bin 0 - Fails (Unexpected, and I believe is a problem)
offlineCompileBug.exe CPU Template.cpu.bin 1 - Succeeds (Not expected, why does compiling twice work?)

Thanks!

0 Kudos
5 Replies
Robert_I_Intel
Employee
538 Views

Logan,

Funny, that I just created this post:

https://software.intel.com/en-us/articles/using-spir-for-fun-and-profit-with-intel-opencl-code-builder

In general, clCreateProgramFromBinary should probably be followed by clBuildProgram, since the binary could be SPIR, in which case it is not fully built. In our GPU case, when you fully prebuild the binary (generate .ir), clBuildProgram does not do much - it is basically a no-op, as evidenced by looking at the build log - it will be empty. In the case of the CPU binary, some linking is still involved at the clBuildProgram step, but compilation step is saved. I will ask the CPU device team whether that is necessary.

 

0 Kudos
Logan_J_
Beginner
538 Views

Robert,

Doubly funny, just read it yesterday and was wishing that I had that article two weeks ago. Nicely written and much needed since good SPIR examples are a bit sparse.

If it helps, the goal is to cache my kernel compilations so I only need to compile the first time my software runs. clBuildProgram is accounting for roughly half of my execution time, so it would be great if the CPU team knows how to avoid it.

0 Kudos
Logan_J_
Beginner
538 Views

Any update?

0 Kudos
Robert_I_Intel
Employee
538 Views

Logan,

I checked with our standards and driver folks: the right thing to do is to always follow clCreateProgramFromBinary with clBuildProgram. They think the current behavior on the GPU is actually a bug.

0 Kudos
Logan_J_
Beginner
538 Views

Thanks, I'll adjust my code accordingly.

Any word though on what to do with the long CPU kernel build times? GPU, even in the presence of said bug, generates the right results quickly. Why can't I do that with CPU, or is there something I'm missing?

0 Kudos
Reply