Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.
3058 Discussions

Intel Media SDK - Optimizing Memory Usage and Scaling

KK
Beginner
290 Views

Hello,

We are using latest Intel Media SDK (2013 release) to do the following:

  • Decompress H.264 1080p stream into NV12 (YUV 420)
  • Then resize this to 4 CIF - RGB format using VPP.

We are decoding 32 independent H.264 streams (1080p) using 32 threads running in the same process. We are using separate pipeline for each stream.

However after 20 -21 streams, allocator is not able to allocate video memory.

We are observing that Alloc() method returns error code as -4 (MFX_ERR_MEMORY_ALLOC).

We have given AsyncDepth=1. It is a 64 bit process running in a i7 4770K machine. Still we are hitting this limit.

We are suspecting that this limitation is due to lack of available graphics memory.

Queries:

  1. We are creating separate memory pool for each pipeline. Can we have a single memory pool which can be consumed by all pipelines for optimal use of memory?
  2. We are not able to scale the number of streams, i.e.., instead of 20 pipelines x 30 FPS we want to have 40 pipelines x 15 FPS. How to meet this?

Please suggest.

regards,

KK

 

0 Kudos
3 Replies
dr_asik
Beginner
290 Views

The latest media SDK is 2014, not 2013. That's a lot of 1080p streams to be decoding in parallel, I'm impressed you can do this at all o.0. Anyhow, you might be able to change the amount of memory available to the iGPU in the BIOS.

0 Kudos
Anthony_P_Intel
Employee
290 Views

Hi,

Yes, I believe you are hitting the limit of Video Graphics memory.  We are consistently working to improve memory use.  I'll discuss you usage model with some engineers and report back here, but I believe single vs. multiple memory pool will not affect much.

Given the bottleneck is memory, I do not believe you can expect to scale just number of streams and framerate.  Frames from any stream need to remain in memory for use by other frames, regardless of how fast they are needed to be used, so when working with 40 streams, you will see frames in memory to support 40 streams, whether they are needed at 15 or 30 fps.

0 Kudos
celli4
New Contributor I
290 Views

I suspect this won't help you, but I'm going to ask anyway just in case it does.

 

Is your application a low latency one? Do you need to get your frames out of the decoder in 50, 100, 200 ms?

Or could you live with your frames in 1000, 2000, 3000 ms latency?

You may have well thought all this through, and it may not be a possibility, but if your application does not require low latency, you could potentially multiplex different streams through a single decoder. You would need to do some h.264 parsing, and pay attention to other details, such as the decoder format. You may also have to make sure you have closed GOP streams.

Basically, you would parse your bitstreams into chunks delineated by the h264 PPS/SPS/IDR header through the next PPS/SPS/IDR.

 

it could be some work, but if your streams meet some conditions, it is something to think about.  [you might have to insert PPS/SPS also, etc]

 

Cameron

 

0 Kudos
Reply