Developing Games on Intel Graphics
If you are gaming on graphics integrated in your Intel Processor, this is the place for you! Find answers to your questions or post your issues with PC games
486 Discussions

Suspected driver bug in D3D12 drivers for Intel(R) HD Graphics 630 leads to DXGI_ERROR_DEVICE_HUNG

BenjiSmith
Beginner
2,618 Views

Hello, I am a game developer writing to report a potential bug that I found in the D3D12 graphics drivers for my Intel integrated GPU. I found that sometimes, deleting a D3D12 resource after the command lists that referenced it had finished would cause problems in later command list submissions.

I sometimes see it manifest itself as an allocation error, which Visual Studio can catch. However most often I see it as a DXGI_ERROR_DEVICE_HUNG error, which will then lead to device removal. Querying DRED after the device removal, it does appear to be a GPU VA page fault, with the virtual address matching the deleted resource. However, this shouldn't be an issue because the resource is only used on frame 0, and only deleted after frame 0 has finished all operations.

I have created a test app that I believe is a minimal repro for the issue. I can attach crash dumps, but given that this ends in a device removal I don't know how helpful they'd be.

The source code for the test app is attached, and can be found here: https://gist.github.com/Benjins/3c266cc2275aab5d063f67b2f4942f5f

The timeline of events for the repro as I understand it:
- Create resources for frame 0
- Draw frame 0
- Wait on the CPU for the GPU to finish frame 0, so that no outstanding command lists refer to the resources created
- Begin creating resources for frame 1
- Delete one of the resources created for frame 0 (all other resources will leak since we don't care about them)
- Draw frame 1
- Wait on the CPU for the GPU to finish frame 1
- Begin creating resources for frame 2
- Observe crash

Moving the resource deletion to before frame 1's resources are allocated avoids the repro. Additionally, using a separate command queue for each frame also avoids repro'ing the issue. These point to the possibility that the driver is keeping some state around that still references the old, now-deleted resource.

I have tried the same code using Nvidia drivers as well as Microsoft's reference CPU-based WARP rasterizer. In both cases, the app works as expected and does not error.

System specs (full info from SSU and DXDiag available in attachments):
- System SKU: LENOVO_MT_80VR_BU_idea_FM_Y720-15IKB
- CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz, driver version 10.0.19041.546
- GPU: Intel(R) HD Graphics 630, driver version 26.20.100.6812
- Software: Windows 10, 20H2, OS Build 19042.746

I have tried to narrow down if it could be a bug in my application, and while it still may end up being one, I'd appreciate getting any insight into what exactly is going on. Thank you for your time.

Labels (1)
0 Kudos
2 Replies
Sebastian_M_Intel
Moderator
2,592 Views

Hello BenjiSmith, 

 

Thank you for posting on the Intel® communities.   

 

Please allow us to review your request internally. 

 

Once we have an update for you, we will post it on this thread. Kindly wait for a response. 

 

Regards, 

 

Sebastian M  

Intel Customer Support Technician  


0 Kudos
Sebastian_M_Intel
Moderator
2,569 Views

Hello BenjiSmith,  

 

Thank you for waiting. 

 

After checking internally, we believe that you will get better support from our "Developing Games & Graphics on Intel" forum, so we will move this thread to it: https://community.intel.com/t5/Developing-Games-Graphics-on/bd-p/developing-games-graphics  

  

Regards,  

  

Sebastian M  

Intel Customer Support Technician 


0 Kudos
Reply