- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We are using latest Intel Media SDK (2013 release) to do the following:
- Decompress H.264 1080p stream into NV12 (YUV 420)
- Then resize this to 4 CIF - RGB format using VPP.
We are decoding 32 independent H.264 streams (1080p) using 32 threads running in the same process. We are using separate pipeline for each stream.
However after 20 -21 streams, allocator is not able to allocate video memory.
We are observing that Alloc() method returns error code as -4 (MFX_ERR_MEMORY_ALLOC).
We have given AsyncDepth=1. It is a 64 bit process running in a i7 4770K machine. Still we are hitting this limit.
We are suspecting that this limitation is due to lack of available graphics memory.
Queries:
- We are creating separate memory pool for each pipeline. Can we have a single memory pool which can be consumed by all pipelines for optimal use of memory?
- We are not able to scale the number of streams, i.e.., instead of 20 pipelines x 30 FPS we want to have 40 pipelines x 15 FPS. How to meet this?
Please suggest.
regards,
KK
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The latest media SDK is 2014, not 2013. That's a lot of 1080p streams to be decoding in parallel, I'm impressed you can do this at all o.0. Anyhow, you might be able to change the amount of memory available to the iGPU in the BIOS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, I believe you are hitting the limit of Video Graphics memory. We are consistently working to improve memory use. I'll discuss you usage model with some engineers and report back here, but I believe single vs. multiple memory pool will not affect much.
Given the bottleneck is memory, I do not believe you can expect to scale just number of streams and framerate. Frames from any stream need to remain in memory for use by other frames, regardless of how fast they are needed to be used, so when working with 40 streams, you will see frames in memory to support 40 streams, whether they are needed at 15 or 30 fps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suspect this won't help you, but I'm going to ask anyway just in case it does.
Is your application a low latency one? Do you need to get your frames out of the decoder in 50, 100, 200 ms?
Or could you live with your frames in 1000, 2000, 3000 ms latency?
You may have well thought all this through, and it may not be a possibility, but if your application does not require low latency, you could potentially multiplex different streams through a single decoder. You would need to do some h.264 parsing, and pay attention to other details, such as the decoder format. You may also have to make sure you have closed GOP streams.
Basically, you would parse your bitstreams into chunks delineated by the h264 PPS/SPS/IDR header through the next PPS/SPS/IDR.
it could be some work, but if your streams meet some conditions, it is something to think about. [you might have to insert PPS/SPS also, etc]
Cameron

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page