I am wondering if there is a watchdog time limit when running an OpenCL program on a non-dedicated Intel GPU, i.e. the graphics card is connected to a monitor?
Particularly, I have a laptop with Intel i7-6500U processor (HD 5500 graphics) running Ubuntu 14.04. I've installed the latest GPU/CPU drivers as well as the OpenCL SDK. When I run my OpenCL program on the CPU target, everything works well. However, when running on the HD GPU, the code hangs when running larger number of loads (run time longer than 5 seconds). I could not find anything obvious in the code to cause this behavior, wondering if it is due to the driver.
If such limit exists, is there a way for me to disable it?
Sorry for the delayed reply. As far as I know there is nothing "built in" to cause a hang like you've described.
Ideally, to proceed with the investigation we will need a small/simple reproducer that you can share without intellectual property concerns.
However, the first thing to check is if the new Media Server 2017 release helps.
- Announcement here: https://software.intel.com/en-us/forums/intel-media-sdk/topic/681674.
- Info on installing in Ubuntu 14.04 here: https://software.intel.com/en-us/articles/how-to-setup-media-server-studio-on-secondary-os-of-linux
hi Jeffrey, thanks for your comment. it is good to know there is no time limit when running a kernel on Intels GPUs.
I posted the hanging problem previously in the below thread:
My software is open-source, so you should be able to checkout the source code, compile and reproduce the issue with the commands provided in the above link.
I tested my code over a range of GPUs and CPUs, I noticed that only two devices gave me this hanging problem: Intel HD 5500 GPU (from an i7-5600U CPU), and an AMD Fiji GPU (R9 Nano). I recently fixed the hanging problem on the Fiji, by replacing the clWaitForEvents() to clFinish(), as in this commit
however, Intel GPU still hangs even with this change. I tried the gdb debugging tool, but it could not break inside the kernel, and I could not find out what was holding the program.
If you can take a quick look and let me know how to debug this problem, that would be greatly appreciated!
In addition, I also noticed a significant speed drop after upgrading to the latest SDK/GPU driver. When I used the previous GPU driver (a patched 4.1 Linux kernel), I was able to get 1000 photon/ms when running 1e5-1e6 photons ("-n 1e5" in the command line). However, with the latest driver (a patched 4.4 kernel), the speed dropped to 80 photon/ms when running a small load (-n 1e5 or -n 1e6). In both cases, the kernel hangs when running for larger number of photons (-n 5e6 or -n 1e7). The code runs smoothly on all Intel CPUs, giving 150 photon/ms (i7-5600U) to 400 photon/ms (i7-6700K). I am also appreciated if you can share any insight to this issue.
I can replicate that your application runs with -n 1e6 but hangs with -n ie7 on the GPU. Will take a look at the code and get back to you in a day or two.
Usually we recommend a small reproducer in these guidelines but it looks like the OpenCL code is mostly in the relatively short mcx_host file. Is there anywhere else to look?
hi Jeff, thank you so much for looking into this.
Can you explain a little bit on the "small reproducer"? what is it? were you referring to the opencl kernel? if yes, the cl kernel is a file called mcx_core.cl:
the capsbasic example attached looks like a "deviceQuery" type of code that enumerates the devices. If this is what you are looking for, you can use
to do the same. The actual source code for this feature can be found here
let me know if I understood your question correctly or not. thanks again.
By "small reproducer", I mean a very small standalone application (usually not your full application) which shows the issue. However, since your OpenCL code is localized to just a few lines we should be able to proceed with this application.
just want to follow up with this 1-yr old thread. after many googling, I found that Intel GPU linux drive indeed has a time limit (~10 second), enabled by the hangcheck parameter. To disable this time limit in order to run OpenCL kernel for longer period of time, you need to run
echo -n 0 > /sys/module/i915/parameters/enable_hangcheck
as root, or sudo nano /sys/module/i915/parameters/enable_hangcheck and replace Y to 0.
after replacing this flag, OpenCL kernel can run for longer period of time without being killed after 10 second run time.
to see if your GPU had hanged, type dmesg after the kernel hangs. related links
PS: looks like in the latest GPU driver (intel-opencl-r5.0) release note, this trick is provided for 4.7 kernel patch on page 6