Hello, I'm having a problem with building one of my OpenCL kernels. I'm trying to build on the Intel HD graphics 4000 graphics card on a windows 64 machine, driver version 10.18.10.5069 (the latest I can find). I'm building the solution with visual studio 2019, and am using the cl2.hpp wrapper from the Khronos page. As I call cl::Program::build, I notice that the (host) memory usage of the program greatly increases, even to over 3 gigabytes. After several seconds of this, the build fails. The build log ends with the following:
fcl build 1 succeeded. fcl build 2 succeeded. Error: internal error.
This code was building just fine on a different windows machine using a different Intel card, but after moving to this machine it does this. The kernel code compiles without any problems, it just doesn't build. After deleting and changing certain aspects of the code, I can get it to build, but it still uses way too much memory. Could you guys tell me what I'm doing wrong or if there's some sort of bug?
I will attach code that has around 15 lines repeated about a hundred times, which is able to reproduce the issue (although my original kernel doesn't repeat 15 lines a hundred times), and the c++ code I'm using to load and build the kernel. Also worth noting is that I'm not linking against the Intel SDK files, but I downloaded and compiled my own from the Khronos repository.
Link Copied
Hi EthanK,
Thanks for sending the info and a representative reproducer.
Comments:
Speculation:
Took a look quick look at the code, there isn't anything that immediately jumps out... I'll try it on a skylake based system. I'm not immediately aware of length restrictions on kernels but FWIW this is longer than most codes people ask for review.
What system did it work fine on? Can you describe that configuration?
-MichaelC
Hi EthanK,
I tried this reproducer on i5-6770HQ graphics. It uses the NEO implementation branch on Windows 10 MSVS 2017. I didn't observe any any compilation issues with your reproducer.
Since I don't have access to the legacy system, I'll pass the kernel on to the development team to see if they have any feedback.. fortunately, the error strings you provided may serve as useful hints. Unfortunately, it may prove difficult to root cause or triage issues with the legacy configuration. We'll see.
On previous speculation:
It may be useful to pass in build options to ensure apples to apples... see cl-std: https://www.khronos.org/registry/OpenCL/sdk/2.1/docs/man/xhtml/clBuildProgram.html
-MichaelC
Also... from where did you acquire 10.18.10.5069? What is the system vendor?
-MichaelC
Hi Ethan,
Intel HD graphics 4000 is a bit old and had some limitations, specifically:
I suspect that these two limitations taken together cause your program to grow quite large, and the compiler eventually runs out of memory, generating the "internal error".
Some suggestions to work around these limitations:
Hope this helps!
Hi EthanK,
I hope BenA's comment can get you through compilation step. If possible, can you share any results and your driver acquisition to the thread?
Thanks,
-MichaelC
Thanks for all the replies!
MichaelC: sorry, before I said that the driver was "the latest I can find" but I now realize it's not, and I'm not sure where the driver is from.
The previous system it worked on was again windows 10 64 bit, with Intel HD graphics 520, driver version 23.20.16.4973.
Also, yes, the ICD loader was the most recent one as of a few months ago at most.
I will try to compile using your suggestions, and report what I find.
Yay! After trying BenA's two simple fixes my original kernel and the reproducer seem to build and run just fine, using a reasonable amount of memory.
Changing boolean to bitwise operators seemed to help it the most, but using both suggestions fixed it completely, thanks!
Good deal. Thanks for confirming.
For more complete information about compiler optimizations, see our Optimization Notice.