Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

simple_decode_vmem - GPU to CPU memory copy - ways to optimize

Kishor_B_
Beginner
966 Views

Hello All,

I profiled and found that copying memory from GPU to CPU is very expensive. I am looking for your inputs to alleviate this performance loss.

While the decoder using video memory gave a performance of 1653 FPS (here the output is not dealt after decoding), copying the decoder's output to system memory after decoding gave just 80FPS (in the simple_decode_vmem application). Such a fall it is, which leaves us with just 2 decodes per processor. I used MFX_GPUCOPY_ON but not to avail any performance benefits.

Note: I am not convinced about the explanation at https://software.intel.com/en-us/forums/intel-media-sdk/topic/557837 as the FPS when using system memory is ~1000FPS for the same clip, which requires the memory to be moved between memories.

Any ideas to deal with data movement between memories?

My setup is: SDK API v 1.16, Core i7-5600U CPU @2.60GHZ, 4 cores, Broadwell, Turbo Disabled, HD graphics 5500, centOS 7.1 

Best Regards, Kishor

0 Kudos
1 Solution
Surbhi_M_Intel
Employee
966 Views

Hi Kishore, 

As explained in the thread you have pointed, simplest and efficient way done in Media SDK is to use IOPattern i.e where the data will be stored. You can o/p the data to system memory by setting IOPattern to be SYSTEM_MEMORY, here Media SDK did the copy from video to system memory in the efficient way. More details on IOPattern can be found in developers guide Pg33. I did a quick run using sample_decode in which if I out to system memory using pattern I am seeing around 25-30% decrease in FPS. 

Try and let us know if this was helpful to you, if not can you please explain your performance concern in detail(provide the no. you are getting and targeting for).

Thanks,
Surbhi

View solution in original post

0 Kudos
6 Replies
Surbhi_M_Intel
Employee
967 Views

Hi Kishore, 

As explained in the thread you have pointed, simplest and efficient way done in Media SDK is to use IOPattern i.e where the data will be stored. You can o/p the data to system memory by setting IOPattern to be SYSTEM_MEMORY, here Media SDK did the copy from video to system memory in the efficient way. More details on IOPattern can be found in developers guide Pg33. I did a quick run using sample_decode in which if I out to system memory using pattern I am seeing around 25-30% decrease in FPS. 

Try and let us know if this was helpful to you, if not can you please explain your performance concern in detail(provide the no. you are getting and targeting for).

Thanks,
Surbhi

0 Kudos
Kishor_B_
Beginner
966 Views

Hi Surbhi,

Thank you for quick reply on this thread. I shall try IOPattern and get back to you with my observations.

A 25-30% drop is far better than what we have reported earlier and a good bet for us.

Best Regards, Kishor

 

0 Kudos
Kishor_B_
Beginner
966 Views

Surbhi M. (Intel) wrote:

Use IOPattern i.e where the data will be stored.

 

Thanks Surbhi. With system memory, I got similar results. You may want to close this thread.

Best Regards, Kishor

0 Kudos
Surbhi_M_Intel
Employee
966 Views

Great, that sounds good! closing this thread, if you have any other query please start a new thread. 

-Surbhi

0 Kudos
Roman_T_
New Contributor I
966 Views

Surbhi M. (Intel) wrote:

Hi Kishore, 

As explained in the thread you have pointed, simplest and efficient way done in Media SDK is to use IOPattern i.e where the data will be stored. You can o/p the data to system memory by setting IOPattern to be SYSTEM_MEMORY, here Media SDK did the copy from video to system memory in the efficient way. More details on IOPattern can be found in developers guide Pg33. I did a quick run using sample_decode in which if I out to system memory using pattern I am seeing around 25-30% decrease in FPS. 

Try and let us know if this was helpful to you, if not can you please explain your performance concern in detail(provide the no. you are getting and targeting for).

Thanks,
Surbhi

Hi Surbhi,

I also need not only to show decoded frames on the screen, but to perform some video anatytics.
So I need decoded frames to be in system memory. 

Does your suggestion mean that I have to use sample_decode data flow procedure instead of sample_decode_vmem?

Best regards,
Roman

0 Kudos
Surbhi_M_Intel
Employee
966 Views

Hi Roman, 

Simple_decode or simple_decode_vmem are tutorials to show how to set up pipeline using system or video memory. Samples are code samples which shows latest API feature are optimized for a better performance on underlying hardware. Depending upon your pipeline you can choose any, if you want to reuse the code I will recommend using sample_decode for a better performance. 

Thanks,
Surbhi

0 Kudos
Reply