Graphics
Intel® graphics drivers and software, compatibility, troubleshooting, performance, and optimization
22463 Discussions

Weird DEVICE_LOST on Intel GPUs

Alexander123
Beginner
1,275 Views

Hi, I'm a beginner graphics programmer and I've stumbled upon a very weird "device hung/lost" issue in one of my compute shaders that only occurs on Intel GPUs, NVIDIA GPUs are working fine and giving me correct visuals/results.

I'm using the latest graphics driver (see SSU.txt in the attached archive).


Which GPUs I've tried:

- Intel(R) HD Graphics 630 - device lost (on both DirectX 12 and Vulkan)
- Intel(R) UHD Graphics 600 - device lost (on both DirectX 12 and Vulkan)
- NVIDIA GeForce GTX 1060 (mobile, 6GB) - correct rendering/results (on both DirectX 12 and Vulkan)
- NVIDIA GeForce RTX 4090 - correct rendering/results (on both DirectX 12 and Vulkan)
unfortunately, I was unable to test this on AMD GPU because I don't have one and I don't know anyone that owns one.

In the attached archive I've prepared built binaries for you to test and reproduce the issue.
 
At some point during the development of the forward tiled rendering, I've noticed that my app stopped working on Intel GPUs, later I've found the exact commit that caused the issue (it was a very small change), see: https://github.com/Flone-dnb/nameless-engine/commit/3329a55f4e1fcb909a9d5b8a645ce39026a36c40#diff-2480fbb0e52cde1df67b9ca3f508300fff7fde33f1b824e740cb3efb703255d3
 
That commit actually fixes an issue that caused light grid to contain false positives in some cases but now it does not work on Intel GPUs. I then created a separate branch (see `intel_device_lost` branch on the same repo) and tried removing everything that may be not relevant to the issue, eventually I only left the compute shader that causes the issue, still same thing - works on NVIDIA GPUs, does not work on Intel GPUs. It's always reproducible on Intel GPUs and judging by the commit diff you just need to comment out one line of shader code to avoid device lost (more on that below).
 
In the attached archive you will find 2 directories where each contains a built binary of the app, one directory stores build from master (`editor_debug_master` directory) and the other from that separate `intel_device_lost` branch (`editor_debug_intel_device_lost` directory). I recommend you trying the `editor_debug_intel_device_lost` first (because it has just 2-3 GPU operations and no visual rendering, just compute shaders) and looking at PIX/RenderDoc for GPU operations.
 
In the directory with the executable you will be able to find shaders that were used, see the `res/engine/shaders/` directory. Specifically, the shader at `res/engine/shaders/include/light_culling/LightCulling.glsl` causes the issue. The app has both DirectX 12 and Vulkan renderers but you shouldn't pay attention to shader file extensions since I usually write one shader file with .glsl extension for both renderers and use this tool (https://github.com/Flone-dnb/combined-shader-language-parser) to preprocess the code to the appropriate renderer. 
If you want to see full HLSL source code just pick some NVIDIA GPU and debug compute shader in PIX. Shaders are compiled on your computer so the bytecode and PDBs are generated at `%localappdata%/nameless-engine/editor/shader_cache` (you might need to tell PIX about PDBs located in that directory). Same thing with GLSL and RenderDoc.
 
Try running the app for the first time in order for configs to be generated. Then make sure an Intel GPU was picked, see `%localappdata%/nameless-engine/editor/engine/render.toml` config. In that config file `iRendererType` can be 0 (for DirectX 12 renderer) or 1 (for Vulkan renderer) and `sGpuToUse` is the name of the GPU to use (or that was automatically picked at first start). If the used GPU is not Intel GPU, then (after you started the app at least one) look at the logs for available GPU names at `%localappdata%/nameless-engine/editor/logs` in the beginning of the log file somewhere around line 8-9 there should be available GPUs listed, just copy the name of your Intel GPU and paste it in the `sGpuToUse` parameter of the config, then restart the app in order for changes to take effect. If something is not right (like if you picked DirectX renderer but Vulkan renderer was used or you wrote your Intel GPU name but after startup the config now shows some other GPU name) see the logs, you might need to adjust some GPU settings on your machine. For example, on my laptop in order to use Intel GPU I had to use NVIDIA Control Panel to change preferred GPU to my integrated Intel GPU and specify its name in the `renderer.toml` config, otherwise the app would fail using it, silently switch to the other GPU and there would be something like `no output adapter` in the logs. So, make sure your GPU is picked correctly if you have multiple, if not sure look at the latest log file for a line that says something like `using the following GPU: "..."` and make sure your `render.toml` has the same GPU name in `sGpuToUse`.
 
About the issue. My guess is that in the file `res/engine/shaders/include/light_culling/LightCulling.glsl` in function `cullLightsForTile` (this is the first function called in this compute shader, see PIX/RenderDoc, entry point for the shader is located at `res/engine/shaders/<language>/final/light_culling/LightCulling.comp`) due to the branching with a `return` statement (see the first `if` in that function) some threads "go idle" but below that branch there is `GroupMemoryBarrierWithGroupSync` call and I guess because some threads "idle" due to `return` but some reached this group sync call and wait a device hung occurs (I know that due to branching some threads in a warp will go idle but I'm pretty sure that it's more complicated than I described, it's just a wild guess, still I believe there shouldn't be a device hung issue). So, if you look at the diff of the commit that caused the issue (see the link from above) there was no `return` previously this is probably why it worked. If you now modify that shader and comment out that `return` statement inside of the branch it will start working without any device hung/lost errors. You can modify the shaders in the attached archive, shaders will be recompiled on next app start. I think on the `intel_device_lost` branch/directory removing that branch or commenting out the `return` statement might be actually safe, but on master it's used to avoid reading the depth texture out of bounds, so that branch is actually needed it's just not used for its purpose on `intel_device_lost` branch/directory, keep that in mind.
 
Try changing `iRendererType` in the `renderer.toml` config as the issue is reproducible on both renderers. I can also reproduce it on Linux with the same Intel GPUs (using Vulkan).
 
Although I've tried removing code that doesn't affect the issue there are still lots of useless code (including shader code) in the `intel_device_lost` branch/directory, so not all shaders/code is used.
 
On my side I've tried creating a simple compute shader to recreate this issue but unfortunately, I was unable to reproduce it (maybe due to compiler removing/optimizing some code). Instead of having a branch with a `return` statement I also tried just wrapping most of the code in the `cullLightsForTile` function in a branch but got the same error (maybe due to some threads still idling and some reaching group sync).
 
I really hope that it's not an issue on my side because it works perfectly on NVIDIA GPUs, and I've tried various ways to avoid that error. Anyway, I'm just a beginner so I'm really sorry if it's something I've missed.
 
P.S. If you would want to compile the repo on that `intel_device_lost` branch, be aware that you won't be able to do so using the latest MSVC compiler (you might need to revert it to a previous version in order to compile the project).
 
Thanks, Alexander.
0 Kudos
3 Replies
RamyerM_Intel
Moderator
1,125 Views

Hello Alexander123, 


Thank you for posting in the communities. I appreciate the effort you took in detailing the issue you encountered with Intel GPUs. I do want to confirm as I noticed you mention that you are a programmer, are you currently developing a game when you encountered this issues? This information is essential to make sure you receive the necessary assistance. 


I will be waiting for your reply. 


Ramyer M. 

Intel Customer Support Technician


0 Kudos
Alexander123
Beginner
1,101 Views

No, I'm not developing a game currently, I'm just working on the renderer/engine.

0 Kudos
RamyerM_Intel
Moderator
864 Views

Hello Alexander123,

 

Thank you for patiently waiting. Upon careful investigation of your inquiry, we want to let you know that we have a dedicated forum for this type of issue, where you can get faster answers. I am moving this to the developer software forums for you. Thank you for your patience and understanding.

 

Ramyer M.

Intel Customer Support Technician 

 

0 Kudos
Reply