We are planning to use latest Intel Media SDK (2013 release) to decompress H.264 1080p stream into NV12 960 x 540 resolution.
Please confirm if VPP is required to resize this into half or if decoder alone is sufficient to get the output stream in reduced resolution (960 x 540).
If possible, what decoder parameter must be used in this case?
Thanks for the clarification. We have another query...
We are decoding 16 independent H.264 streams (1080p) using 16 threads running in the same process.
However after 9 - 10 streams, allocator is not able to allocate video memory. We are observing that Alloc() method returns error code as -4 (MFX_ERR_MEMORY_ALLOC).
Whereas same is possible when we run each stream from separate process. Is there any limitation per process. Please suggest.
I suspect the limitation you encounter is due to lack of available graphics memory (not your threading approach). If you are using 32 bit OS, a switch to 64 bit OS will free up more graphics memory resources. Also, you may explore slightly modifying the decode configuration, such as setting AsyncDepth=1, to decrease the number of internally buffered surfaces.
Current Configuration that we are using is:
- 64 bit OS with AsyncDepth=1.
- We are running the application on i7-4770K processor with Intel HD Graphics 4600 having following Graphics memory:
- Total available graphics memory : 1760 MB
- Dedicated Video Memory : 128 MB
- Shared Video Memory: 1632 MB
We have modified Intel SDK’s ‘sample_decode’ so that each salvo (window) will run from the same process but from separate dedicated thread. Totally we want to run 16 salvos (windows).
DecRequest.NumFramesSuggested is 6 frames while calling m_pMFXAllocator->Alloc(…).
So this is getting called 16 times (one for each session). But it fails after 9-10th instance.
We further debugged and found that m_decoderService-> CreateSurface(…) method is failing with MFX_ERR_MEMORY_ALLOC.
We have also observed that if m_pMFXAllocator->Alloc(…) is called for 96 frames in one try, then it is able to allocate. So we are assuming memory availability should not be an issue. Please suggest.
It seems if we have multiple allocators one for each window, the above problem occurs whereas if we have single allocator have all surfaces allocated via the same allocator, the above problem should not happen. Please confirm. If true, please explain how to use same allocator for all sessions and windows.
Based on your description it's clear that the DirectX framework, at the point of calling CreateSurface, determines that there is not enough available memory. This is outside the control of Intel Media SDK.
One thing you could try is to encapsulate the DirectX and allocation part of the initialization stage and treat it as a shared resource locked by a critical section. It may help DirectX handle underlying resources in a more efficient way.
Also, if not already, try reusing (if feasible for your application) the same DirectX device for all the concurrent workloads/windows. Doing this will save you some memory.