Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

D3D11 Surface Usage

Rūdolfs_B_
Beginner
1,473 Views

Hi,

I wanted to clear out information about the D3D11_USAGE values that should be used with Intel Media SDK. I have a pipeline VPP->ENCODE that currently runs in system memory and since everyone says that video memory should be faster I've began  I read notes on the forum that it is wise to reduce the data amount transferred from system to gpu memory and looked through the code provided in the samples, but after looking at the usage type descriptions in MSDN (http://msdn.microsoft.com/en-us/library/windows/desktop/ff476259(v=vs.85).aspx), I've become confused about which gives the best performance.

1) VPP input -  D3D11_USAGE_DEFAULT or D3D11_USAGE_DYNAMIC? The table does not state that the D3D11_USAGE_DEFAULT can perform a CPU write while it still says that ID3D11DeviceContext::UpdateSubresource can be used to update the surface from a system memory pointer, while D3D11_USAGE_DYNAMIC can be simply mapped. Which is better/faster? Since I know the regions that change in the frames I need to encode it intuitively makes sense not to map the whole frame in the CPU memory but maybe I am wrong here.

2) VPP output/ENCODE input - from the table in MSDN it seems that D3D11_USAGE_DEFAULT is the obvious choice here since I do not need to read the data myself, it is meant just for the GPU

3) ENCODE output - again, from the table it seems that D3D11_USAGE_STAGING should be the right choice, while the SDK samples actually use a combination of D3D11_USAGE_DEFAULT and D3D11_USAGE_STAGING and perform copying the memory in GPU from the actual output surface to the the staging surface. If a have a preallocated pool of these surfaces, is there any use of having the combo of default + staging? From my perspective the memory copy on the GPU is just a waste, since if I have a pool, then when the encoder is done I just feed it the next free surface from the pool and map the finished surface and peform memcpy myself, to allow the encoder only to care about the encoding process. But maybe there are some penalties if you use D3D11_USAGE_STAGING as a ENCODE output surface?

 

0 Kudos
5 Replies
Anthony_P_Intel
Employee
1,469 Views

Hi,

In general, when Media SDK is used with hardware acceleration, the internal implementation of the HW library will perform an optimal copy from system memory to video memory if necessary.  The best D3D11_USAGE may depend on how much CPU access the application desires.

for (1) VPP input, the GPU only needs to 'read' the input surface for the VPP operation.  Assuming you need CPU "write" access, I believe a surface with D3D11_USAGE_DYNAMIC or D3D11_USAGE_DEFAULT would be identical performance with our integrated graphics architecture.

for (2) VPP out->Encode in,  Yes, since GPU write and read are required  D3D11_USAGE_DEFAULT. You may want to look at the Media SDK feature of "opaque" memory.  This mode allows the MediaSDK pipeline to know that there is no intention for any CPU intervention and management of the surfaces can be optimized without concern for memory type.

for (3) Encode stream output, the output is encoded(compressed) bitstream of various size.  The actual output of hardware is a GPU 'write' operation and the actual use of the bitstream data depends on what the application plans to do with the encoded bitstream.

0 Kudos
Rūdolfs_B_
Beginner
1,469 Views

Hi Tony,

thanks a lot for the clarifications, very useful. Actually what you said is what I am doing right now, I'm using system memory for VPP input and ENCODE output and opaque memory in between. But after all I've read on this forum and the examples I was under the impressions that using Direct3D surfaces explicitly and supplying my own frame allocator gives a better performance that leaving all that up to the SDK. So is that false?

0 Kudos
Sravanthi_K_Intel
1,469 Views

Hello there, As a general rule of thumb, using Video Memory for processing in MSDK gives the best performance (since you are operating on the hardware and not spending I/O time transferring data from system to video memory). VPP processing, and Encode/Decode processing can be done on Video memory fully.

So, the short response is - you can build your application pipeline in MSDK that uses video memory and underlying hardware for best performance. ("Opaque surfaces, as the name suggests, are managed by the SDK and are not visible or controllable by the developer. These surfaces are best used when the functionality required is basic and concrete (like decoding or encoding) and will not be expanded upon later. As a thumb rule, we always recommend using hardware implementation with video surfaces.", taken from technical article - https://software.intel.com/en-us/articles/framework-for-developing-applications-using-media-sdk)

Hope my response did not confuse you! If you can give me some more information on what you are looking to develop, we can give more information if needed.

0 Kudos
Rūdolfs_B_
Beginner
1,469 Views

Hi,

I'm aware that the video memory should be used for best performance, the thing I really wanted to understand without making my own tests is if the pipeline VPP->ENCODE is equally fast in the two scenarios below:

SYSTEM MEMORY -> VPP -> OPAQUE -> ENCODE -> SYSTEM MEMORY

My code that copies from RAM to GPU surface -> VIDEO MEMORY -> VPP -> VIDEO MEMORY -> ENCODE- > VIDEO MEMORY -> My code that copies from the GPU surface to RAM 

If the SDK does basically the same and from the output type of opaque memory it can deduce that it should copy the input surface from system memory to video memory itself and then run the operations then there is no need for me to write any code that handles the RAM/GPU surface transition. If VPP just reads each pixel from the system memory and only outputs to video memory then manual copy would be an improvement.

 

0 Kudos
Sravanthi_K_Intel
1,469 Views

"If VPP just reads each pixel from the system memory and only outputs to video memory then manual copy would be an improvement." -> this is not the model.The MSDK does hardware optimized block transfer from system memory to video memory automatically, and operates on the video memory (using hardware acceleration). In short, you do not have to rely on manually copying for performance, MSDK takes care of that for you.

0 Kudos
Reply