I've confirmed we have developers actively working on this. The issue is isolated in the compiler stack used for Broadwell/5th Generation Core architecture. At this point I don't have a timeline to communicate yet but I will keep you posted as more details come in.
I have some good news and some bad news on this issue. The good news is that compiler robustness improvements are being made. (Thanks for your help on this!). However, the main direction for the proposed changes is for the compiler to gracefully exit with an error for kernels with complex goto-based control flow which previously caused the hang/memory leak/crash. The intent is not to extend the complexity of goto control flow which can be compiled. Simplifying the control flow by reducing/removing gotos in your kernels will help.
Thanks for letting me know. However, if you take the goto out (doing nesting), the issue persists, so is not the goto to blame.
I have plenty of kernels with gotos and there is no issue with them.
Also I would like to know what compiler we are talking about. I tried with a combination of compilers (gcc and icc) and OpenCL SDK (Intel and AMD) but the issue persists.
With Mali GPU under Linux works perfectly.
I attached the nested one without gotos; same issue.
Thanks for the update. I can replicate with your new kernel showing that goto is not to blame.
I believe you -- if multiple other GPU OpenCL compilers work and ours is an outlier it implies there is something to investigate. Actually, the behavior on Broadwell here is an outlier even among Intel Gen compiler implementations. Broadwell/5th Generation Core/Gen8 is the last to use the "old" Gen compiler, and this is where your issue can be replicated. If you compile on Skylake/6th Generation Core/Gen9 or forward the newer implementation completes successfully for both of your kernels in my tests.
The graceful exit for the "old" compiler mentioned above is not the last step. It is a temporary stopgap (for Broadwell, as newer generations already have the fix) which can be completed quickly to avoid crashing for a variety of scenarios where the control flow becomes too complex -- with or without gotos. The longer term and fundamentally better answer is to get Broadwell moved to the new compiler architecture. This is underway but it is too early to promise a timeline. The team understands that the compiler update for Broadwell is needed as soon as possible and I will do my best to continue to push for this too.
Thank you very much indeed. Yes, you can trust me, goto is to to blame, since I have other kernel full of goto and they work fine.
The issue is more complicated of what it seems, because if you just comment a line in the kernel at random, it compiles.
What I would love to understand is what compiler we are talking about. As I said I tried a combination of gcc, icc, amd sdk and intel sdk with no luck. So if is not the compiler or the sdk what is it? I am straggling to understand.
What is the Gen compiler? Could you please explain to me with more details where is the fault?
I did choose specifically the i7-5775C for the GPU.
The kernel works fine with arm mali gpu and linux.
P.S. I attached a more simplified kernel.
Thanks for the new kernel. I verified that it compiles using the new compiler infrastructure available for Skylake for Linux and Windows.
One place you can experiment more with the compiler is the ioc64 command line OpenCL compiler interface, which is part of the SDK.
This is what I see on Skylake:
$ ioc64 -cmd=build -input=TestKernel1.cl -device=gpu OpenCL Intel(R) Graphics device was found! Device name: Intel(R) HD Graphics 520 Device version: OpenCL 2.0 Device vendor: Intel(R) Corporation Device profile: FULL_PROFILE fcl build 1 succeeded. bcl build succeeded. combine_kernel info: Maximum work-group size: 256 Compiler work-group size: (0, 0, 0) Local memory size: 0 Preferred multiple of work-group size: 8 Minimum amount of private memory: 0 Build succeeded!
The fault is in the Broadwell compiler infrastructure. In scenarios like yours -- including the new simplified kernel from your last post -- the control flow for the kernel can parse to a graph with too many nodes. It's a bug, but as far as we know it is isolated to the Broadwell Gen (GPU) compiler. The newer compiler for Skylake and forward does not have this problem so the backport/Broadwell upgrade should fix it. Please watch for more info in the "What's new" blogs and release notes.
Thanks again. What compiler is the Broadwell Gen (GPU) compiler? Is it part of ioc64? Why gcc and icc along with amd sdk and intel sdk are unable to compile the kernel?
Yes, ioc64 is a front end for the compiler. You can use this tool to test assembler, SPIR/SPIR-V, and other output formats from the command line.
This article may be of interest: https://software.intel.com/en-us/articles/introduction-to-gen-assembly
While kernel syntax should be very recognizeable to C programmers it is only a subset (with a few additional functions, types, etc.). While it is often true that kernel code can be pasted into the inner loop of standard C programs this does not always work. Another thing to consider is that compilers implement a language specification but are not guaranteed to behave the same way for all scenarios if the input does not match the spec. Different fault tolerance approaches can be a reason for different behaviors between compliant compilers too. Continuing to simplify may help with portability and performance. Complex conditionals/multiple code paths are usually not recommended for GPUs as many of the paths may be computed but masked out of the result.
Thank you very much. You said ioc64 is a front end for the compiler, what compiler are we talking about?
I'm straggling to understand where the fault is. I tried to compile the kernel without ioc64 and still does not compile, so are both gcc and icc at fault? How can be both at fault? Surely must be something else.
The complexity of the kernel is not at fault, since it works fine with Mali GPU and as you said previously, in windows or other Intel GPUs.
In order to execute code on GPU, it needs to be compiled to GPU understandable assembly.
This is what ioc64 is doing, it compiles your kernel to the GPU assembly code.
This compiler is a part of GPU OpenCL driver.
OpenCL Kernel language has some additional constructs on top of traditional C code, that's why you can't compile it using traditional compiler like gcc.
Thanks. Are you saying that I can not compile the kernel without ioc64?
There has been a misunderstanding, ioc64 is for offline compiling, but I also tried online compiling, in order to check if the problem was only with ioc64.
I use clBuildProgram in a c program and opencl sdk, to compile the kernel online and something like this:
gcc loadkerneldebug.c -lOpenCL -oloadKernel (plus link to amd or intel opencl sdk)
icc loadkerneldebug.c -lOpenCL -oloadkernel (plus link to amd or intel opencl sdk)
and it gives the error that can not compile the kernel, so i can use both offline and online compiling, to test the kernel.
I have attached the loadkerneldebug program that load the kernel, try to compile it and gives an error if can not compile it.
So, if the problem is the same with gcc, icc, amd opencl skd and Intel opencl sdk, who is at fault?
Most of the same components are used for online and offline compiling, so it makes sense that behavior is the same for both modes.
The compiler implementation is currently different for Gen8 (Broadwell/5th Generation Core) and Gen9 (Skylake/6th Generation Core) and forward. If we look at just these two we know that the Gen8 version as it is today has some limitations we are hoping to upgrade. Between these two Intel GPU compiler implementations (the only ones listed which are within the scope of this forum) the fault is clearly in the Gen8 compiler and is being processed as such. If you can get access to a 6th or 7th Generation Core machine I hope you will also find as I have in my tests that the kernels you've submitted compile without issue.