HW encoding with multiple monitors + nVidia + OpenGL?
Our application captures 1080p@30Hz video, and displays it in a 3D environment in real-time, using OpenGL. We would like to also record the video in h.264 using QuickSync (we used software MPEG-2 before).
I've tried both the 2.0 SDK and the 3.0 beta, with different results.
Windows 7, running on aDell Latitude E6420 laptop with Sandy Bridge + nVidia NVS4200M.
Intel driver version 22.214.171.1249
Monitor #1: 1366x768, this is the primary display, and is driven by Intel
Monitor #2: 1920x1200, driven by nVidia
Monitor #3: 1920x1080, driven by nVidia
First of all, HW acceleration seems to only work, when we also use Intel graphics for OpenGL. This *could* be acceptable, howeverour application needs to render at 1080p size, but the main monitor is not capable of displaying that. So, we would render on one of the other monitors, however I cannot assign the intel to anything but the first display. I can create a larger window, and move it to the other monitors, but the performance suffers a lot (I'm assuming the 3D surface is rendered by Intel, then copied to nVidia?).
1. Ideally, I'd like to use nVidia to render the 3D scene, and Intel to encode the video. Is this possible?
2. If not, I'd like to make the Intel graphics drive either the 2nd or 3rd monitor. Is this possible?
Also, with this setup (running the app on the first monitor, using Intel for OpenGL), HW encoding works with SDK 2.0, but it does not with SDK 3.0b.I getMFX_ERR_UNSUPPORTED during::MFXInit(MFX_IMPL_AUTO, 0, &enc);
- Intel Media SDK using HW acceleration can only be achieved on Intel Graphics.
- Intel Media SDK does not natively support OpenGL surfaces (D3D or system memory surfaces are supported). This means that as long as OpenGL is used there will be a performance hit when copying surfaces to OpenGL context.
Yes, the multi gfx setup you describe is a valid scenario, and you can certainly use Intel Media SDK to HW accelerate decode(or encode) via the Intel graphics adapter. However, to perform rendering or any kind of surface processing on nVidia gfx the decoded surface must be copied into an nVidia surface. As far as I know you are forced to lock both Intel and nVidia surface and perform the copy using CPU (this will naturallyhave a performance impact).
Regarding your second question. You can also use Intel Media SDK if Intel gfx is set as secondary adapter, but it requires you to use for instance MFX_IMPL_HARDWARE2 target then initializing Media SDK session (this topic is described in further details in the Media SDK manual appendix).
One more thing. When you initialize the Media SDK session it is recommended that you explicitly set API version (see Media SDK decode sample...). If the version is set to 0 it means that the API version of the SDK release will be used, in case Media SDK 3.0 (2012) this means API 1.3 which is currently only supported in SW for current generation platforms. API 1.3 will be available with HW acceleration as part of the new generation platform that will be released early this year.
I understand that this setup requires some extra copying, but that's fine. We capture live video into main system memory, so we have to copy anyways. Actually, once a frame arrives to system memory, it's copied twice. One copy converts to BGRA and goes straight to OpenGL texture memory, the other converts to NV12 and goes to Intel for encoding. At this point, encoding to h264, and rendering via OpenGL should be entirely independent (they are running in separate threads even).
Following your suggestion, I set the version number manually to 1.0, and that made the latest SDK to work, when using Intel for rendering.
However, if I use nVidia for OpenGL rendering, it still "works", but it reports MFX_WRN_PARTIAL_ACCELERATION, and the performance is unacceptable.
I've tried using MFX_IMPL_HARDWARE2, and the others, but they all gave me MFX_ERR_UNSUPPORTED. MFX_IMPL_HARDWARE worked, but only with partial acceleration.
Again, if I switch to using Intel for rendering, it all works perfectly, and CPU usage is very low, even with all the copying. It's just that the rest of the OpenGL (not related to encoding) is not as powerful as it would be with nVidia. It seems like, if I just used two separate processes, I could do both encoding on Intel and display on nVidia just fine. But obviously, that's not an ideal solution.
Can you explain what you mean by using nVidia for rendering? If the Intel gfx driver is associated with adapter 0 (primary) then you would grab the decoded surface then copy it over to the nVideo domain using the adapter/device associated with the nVidia adapter.
MFX_WRN_PARTIAL_ACCELERATION means that Media SDK is falling back on SW decoder.
As a first step you might want to try achieving the result using DirectX and not OpenGL. That way you would only need to modify the sample_decode sample slightly to instead create a separate D3D device associated with the nVidia adapter then using that for rendering instead of the Intel device as in the default sample.
By the way, there's no decoding in our application at all.There's the raw, uncompressed video, and it gets simultaneously encoded and written to disk, and uploaded to texture memory, and displayed in a 3D scene.
However, rendering should have nothing to do with encoding, they are totally separated. They don't even work on the same data.I could just as well render a single white point (no video involved at all) in a thread, while encoding a raw video stream (which has nothing to do with the point) in a separate thread.
The problem is that on systems with multiple GPUs (in this case there's Intel and nVidia), you can choose which GPU is being used for rendering the point.
When I choose the Intel GPU to render the point, the video gets encoded in hardware. When I choose the nVidia GPU to render the point, the video gets encoded in software..
Here's a question: Does the Intel GPU have to be attached to a rendering context in order for the encoder to work? One thing I could try, is create a 1x1 off-screen dummy window, and attach the Intel D3D device to it.
Using API 1.1 does not make a difference, so I reverted back to 1.0.
I also noticed, that when using multiple monitors, it's not enough to have a display attached to the Intel graphics, and choose the Intel GPU for rendering. It also *have* to be the primary display that's driven by Intel. If the primary display is driven by nVidia, then everything will reportMFX_ERR_NONE, but the encoding will still happen in software.
To answer your question, Intel GPU does have to be associated to a D3D device as can be seen in the Media SDK sample_decode and sample_encode samples. However, the device does not have to be used for rendering.Creating D3D device for use with Media SDK via CreateDevice requires window handle (make sure that the handle represents a window in the Intel gfx context). Also make sure that the device adapter is the adapter id of the Intel gfx device).
Intel gfx does not have to be associated with primary adapter to use HW acceleration. If Intel gfx is on other adapter than primary (0) then just make sure to initialize Media SDK session withMFX_IMPL_HARDWARE... as noted earlier. Whet you are encountering when nVidia card is primary and using MFX_IMPL_AUTO is that Media SDK just selects SW codec implementation (= MFX_ERR_NONE), this is expected.
Regarding API 1.1 vs. 1.0. I did not mean to say changing to API 1.1 would resolve the issue, just that 1.1 represents a greater set of features. But if you want maximum backwardcompatibility and do not need any of the features of API 1.1 please feel free to use 1.0.
Hey Petter, thanks for the help!I think I found the problem, although unfortunately I'm not sure there's a solution.
I've tried to create a D3D device as you suggested.
Now, in the nVidia control panel, you can set the preferred GPU.When it's set to Intel, or Auto, I can enumerate the adapters, and the 1st one will be Intel and the 2nd nVidia.
However, when I set the preferred GPU to be nVidia, then enumerating the adapters will still give me two adapters, but both will be nVidia...I'm afraid this is the root cause of problem.
Do you have any idea how to fix this? Our software is OpenGL based, and we need to render with the nVidia GPU, because the Intel GPU cannot render efficiently to the external high resolution monitors, which we need to use.
Actually, I just found out about an OpenGL extension that lets me choose a GPU for rendering ( http://www.opengl.org/registry/specs/NV/gpu_affinity.txt). So I'm going to set the preferred GPU in the nVidia control panel to Auto, and then select the GPU manually from software. Hopefully that'll do the trick, and let the Intel GPU do the encoding.
Ok, I finally figured out a solution, although it looks to me there's a bug in the Intel library, that should be fixed. Here are the details:
When the preferred GPU in the nVidia control panel is set to "Auto", OpenGL will pick the GPU that is assigned to the primary monitor. So, if the primary monitor is driven by nVidia, then the default GPU for OpenGL rendering will be nVidia. At this point, using the 3.0 SDK, andMFX_IMPL_HARDWARE_ANY, the library will *sometimes* find the hardware accelerated h264 encoder.
I say sometimes, because it doesn't work, if there is a fullscreen OpenGL window active at the time of initialization. If there's no window at all, or if there's a "windowed" OpenGL window, it works like a charm. But if I'm callingMFXVideoENCODE_Init, while a fullscreen OpenGL window is active, it will fail withMFX_WRN_PARTIAL_ACCELERATION.
This sounds like a bug to me. If the encoder can initalize in windowed or windowless mode, there's no reason it shouldn't be able to initialize in fullscreen mode.
For now my workaround is that I wait for the encoder to fully initialize, and only after the encoder is running in hardware, I create my fullscreen OpenGL window.
FYI: Creating a D3D device for the Intel GPU before the MFX initialization had no effect whatsoever. Also, the GPU affinity extension for OpenGL is only supported on nVidia Quadro GPUs, so that wasn't useful either..
Thanks for all the advice! I have to say that being able to encode 1080p video in real-time, while running a fullscreen OpenGL app, with barely any CPU usage is quite incredible!