Thanks for sharing the issue, MFX_ERR_DEVICE_FAILED generally comes when there is a failure in hardware acceleration, regarding the setup - any particular reason you are using simple_3_encode instead of simple_3_encode_vmem, making use of d3d9 and d3d11 implementation or use sample_encode and use hw acceleration?
Do you see this issue without the discrete graphics card connected? this will might help to narrow down the issue
Thanks for doing more experiments and providing detailed analysis. Let me try to explain the reason of the failure and probably you can match with your experiment -
1) simple_3_encode runs OK if the monitor is connected to Intel HD Adapter. NVIDIA GTX 960 is present on the machine.
2) simple_3_encode fails with MFX_ERR_DEVICE_FAILED if the monitor is connected NVIDIA GTX 960.
In simple_3_encode, default implementation is AUTO_ANY, which choose the default implementation i.e. via d3d9 unless you specifically choose sw. There is a known limitation in MSDK - In the presence of discrete graphics card and using d3d9 implementation, the monitor needs to be connected to Intel graphics device. So you see failure in second case.
3) simple_3_encode_vmem (DX11) runs OK if the monitor is connected NVIDIA GTX 960.
Above limitation is applicable to d3d9 and not applicable to d3d11.
Since your system supports d3d11 implementation, I will recommend to use the performance goodness of d3d11 instead of using d3d9. If you are doing single stream encode, I would recommend you to check async depth parameter which tells number of asynchronous pipeline to run before calling sync operation. Async depth = 4 or 5, results in better performance, you can find more details in this article https://software.intel.com/en-us/articles/aync-and-join-operation-in-media-sdk-multi-transcoding and code details from simple_3_encode_vmem_async tutorial.