I`m working on a multithreaded H.264 decoder/renderer. It is based on sample_decode example with multithread adaptation. I`m restricted on the usage of Direct3D9 and 32 bits. Moreover, I need to have decoded frames in system memory. My configuration is Core i7-4790 (HD 4600, latest drivers), Windows 8.1 x64, INDE 2015 Pro, MediaSDK 220.127.116.118
I`ve tried several approaches:
1. IOPattern = MFX_IOPATTERN_OUT_VIDEO_MEMORY. Works good, I am able to decode and show 20+ streams of 1080p 25Hz H.264. But when it comes to having decoded images in system memory then I have problems. Locking surfaces via LockRect is extremely slow, I`m not able to have even 1 FullHD stream. I`ve redesigned it by using OffscreenPlainSurface combined with GetRenderTargetData as Microsoft suggests and was able to achieve something like 85-90 fps (3 streams works fine, when there`s 4 of them - decoding slows down). Is it possible to speed this up? And one more thing. While decoding 22-23 FullHD streams my decoder consumes about 600M of system memory. Trying to add any more streams leads to the crash in random location of the code, which I think is caused by the lack of some resources, but I have no clues of which ones. Any ideas about this?
2. IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY. Works good while streams count is relatively small (~10-12). If streams count is bigger then my machine could hang in random moment without any signs and errors. When it comes to rendering then I have difficulties too. The main question is - how can I create Direct3D surface from mfxFrameData. Right now I`m using OffscreenPlainSurface, which contents I update through D3DLOCKED_RECT rect. Then I`m calling UpdateSurface to copy it to GPU memory and so on. But this approach involves a lot of memory copy operations and I can see very big CPU load on streams count more then 5 (comparing the case without rendering) Is there more efficient way to create Direct3dSurface from mfxFrameData?
Hi Mihail, Sorry for the delayed response on this one. Unfortunately, DirectX-based questions are better suited for Microsoft forum, and not here. The MSDK provides you with APIs to copy data from system<->video memory, callback to the decoder logic among other things, but we do not provide support or expertise on how to use these surfaces to render the frames on screen. That is where DirectX comes in, and we leave it to the application developer to handle the player implementations.
Having said that, if you are looking at pure decode performance (without rendering or players), our AVC decoders can comfortably sustain >20 1080p streams real-time on the system you are looking at. But once you add the render/player logic, the concurrent streams that can be decoded will be most impacted by the player implementation. Our implementation of a simple player is available in sample_decode - and you are already familiar with it. Not sure if I can offer more help here.