Problem to decode H264 video in real-time

Benjamin_Blanc · ‎10-06-2011

Hi all,

I plan to use Intel H264 decoder to decode real-time video feeds.
My program is working, but the decoder needs several frames at the beginning of the live encoded stream before returning a decoded frame. This issue creates delay between real-time and decoded feed. I check and the first frame is my stream is a key frame. Moreover, I tried with another H264 decoder and it is able to decode frames without buffering.

Is there options or parameters to configure the decoder in order to force it to start decoding without buffering?

Thanks in advance,

Benjamin

Anthony_P_Intel · ‎10-06-2011

Hi Benjamin,

The default behavior of the Media SDK is optimized for usage models like trans-coding video from one format to another, and the buffering you observe is beneficial to the overall transcode time. The behavior you desire seems to be simular to the low latency desired for video conferencing usage model.

From 'current" Media SDK FAQ:
Q12: Is H.264 optimized for transmission latency?
A12:Intel Media SDK is not optimized for offline video editing, trans-coding or video playback usages for streaming or video conferencing usage models where latency would be a focus.

But...The new "Media SDK 3.0" (currently beta) has some additions that you may find useful. Please take a look at the video conferencing additions.

-Tony

Benjamin_Blanc · ‎10-07-2011

Hi Tony,

Thank you for your quick answer! You're right, what I try to achieve is to have low latency video.

I took example on the video conference encoding sample to set my values for low latency, like mfxExtCodingOption.MaxDecFrameBuffering = 1.

It seems to work very well on the encoding side and I am able to decode the video as soon as I receive the first frame with another decoder. Unfortunately, I am not able to do that with the Intel decoder.

I understand that the SDK is not optimized for real-time video, it has other goals, so I really need to know if it is possible or not to remove this buffering on the decoding side?

Thanks again,

Benjamin

Anthony_P_Intel · ‎10-07-2011

Hi Benjamin,

The good news is that the Media SDK 3.0 release does have low latency goal. While it is still "beta", you can feel confident that we are (and have been) working on this feature.

Here are some useful tips (from my colleague currently working on the documentation) for configuring Media SDK 3.0 for this usage model.

For Encoding, try setting the following:

mfxVideoParam::AsyncDepth = 1
mfxInfoMFX::GopRefDist = 1
mfxInfoMFX::NumRefFrame = 1
and if your displaying imediately after decoding
set mfxExtCodingOption (MFX_EXTBUFF_CODING_OPTION) and mfxExtCodingOption::MaxDecFrameBuffering = 1

For the Decoder, set
mfxVideoParam::AsyncDepth = 1 and
mfxBitStream::DataFlag = MFX_BITSTREAM_COMPLETE_FRAME
as well as mfxInfoMFX::NumThread = 1 (this may change in final 3.0 implementation)

There are a few other valuable video conferencing features coming, so please watch for updates to the Media SDK 3.0 Beta releases.

Currently, we expect to have higher latency on the first frame, but all the following frames should work very well with these settings.

-Tony

Benjamin_Blanc · ‎10-11-2011

Hi Tony,

Thank you very much. I tried the parameters you sent to me and it seems to work well, it reduces the latency to less than 1 second. I will continue my tests.

Thanks again!

Benjamin

Mike_S_8 · ‎03-25-2013

I am having a similar problem that Benjamin described above. I need to decode H.264 frames in real time, and I have having trouble with the decoder. The decoder has been properly initialized, and I am able to decode frames and display them at a reasonable rate. However, the first several frames passed into the DecodeFrameAsync function return MFX_ERR_MORE_DATA. I am able to get these frames after processing about 10 frames, but the problem is that I now have a visible lag of the real time image data. Also, the first frame that I am processing is always a key frame.

I have set the AsyncDepth = 1, the NumThread = 1, and I am using the MFX_BITSTREAM_COMPLETE_FRAME flag. In addition, I have another client that is decoding the same H.264 video using the publically available ffmpeg library, and there is no visible lag of the real-time video.

I really need to decode every frame that I pass to the decoder immediately -- and cannot wait until new frames arrive before processing that frame. Is there a way to accomplish this? I'd really like to use the Intel Media SDK since I have been very happy with several other Intel libraries in terms of performance, and I would rather not have to integrate the ffmpeg library into my application instead.

Thanks for any help that you can provide

Petter_L_Intel · ‎03-25-2013

Hi Mike,

What is the format of your input stream to decoder? To ensure low latency, the stream must not have any B-frames.

Please refer to sample_decode (using the low latency command line option) sample for reference on how to decode with low latency.

Additional note: NumThread = 1 is not required.

Regards,
Petter

Mike_S_8 · ‎03-25-2013

Hi Petter,

The format of the input stream is NAL unit stream format, and the stream does not have any B-frames.

I have looked at the sample_decode example and I was not able to determine how the low latency feature worked. I was hoping to build the sample application and step through the code, but unfortunately my system does not have "dxva2api.h" or "dwmapi.h" so I am unable to compile the example. Could you explain exactly how to make the low latency feature work? The example seems to be copying/moving the data in the bitstream for the H264 case, but I didn't understand why this was being done. I really need to avoid unnecessary copying as I need my application to run as fast as possible. It would be great if there was an alternate function to DecodeFrameAsync which always decoded a frame without asking for more data.

Thanks

Mike

Petter_L_Intel · ‎03-25-2013

Hi Mike,

I encourage you to install Microsoft DirectX SDK. It should enable you to build the sample.

If the workload is executed using D3D surfaces there are no surface copy operations involved. There are copy operations related to bit stream management but due to the size of the bitstream data chunks the latency overhead is negligible.

There is also a white paper on this topic on Intel VCSource: http://software.intel.com/en-us/articles/video-conferencing-features-of-intel-media-software-development-kit

Regards,
Petter

Mike_S_8 · ‎03-25-2013

Hi Petter,

Unfortunately I still cannot build the sample. I installed the latest Microsoft DirectX SDK and it does not contain either of the header files that I mentioned above. Also, I read over the white paper you suggested, and it does mention that the encoder has to be setup with a few settings in order to get low latency working. Unfortunately, the encoder is not using the Intel Media SDK, so I cannot change the encoder settings to be optimal.

Is it is possible to call DecodeFrameAsync without giving it multiple frames? Perhaps what I am trying to do is not possible.

Thank you very much for your help so far -- I'm still hoping to get the Intel Media SDK to work optimally.

Mike

Petter_L_Intel · ‎03-26-2013

Hi Mike,

Please also make sure you have installed Microsoft Windows SDK. It should have the the include files you need.

Most encoders can be configured in a similar manner as the Media SDK encoder. Besides encoding the stream without B-frames you must also ensure that the encoder sets the Decode Picture Buffer (DPB) to 1 while encoding the stream. This to ensure that the decoder does not buffer frames. Media SDK encoder does this by setting the parameter MaxDecFrameBuffering to 1, as described in the white paper. Other encoders should have the same capability.

If the encoded stream is encoded as specified and the decoder is configured as suggested, then DecodeFrameAsync will not require multiple input frames.

Regards,
Petter

tonny · ‎05-20-2013

Hi

I have zhe same question as Benjamin. I put one whole I frame into zhe decoder.

And I set the m_mfxVideoParams.AsyncDepth = 1

pBS->DataFlag = MFX_BITSTREAM_COMPLETE_FRAME;

when I execute the follow steps:

sts = m_pmfxDEC->DecodeFrameAsync(&m_mfxBS, &(m_pmfxSurfaces[nIndex]), &pmfxOutSurface, &syncp);

// ignore warnings if output is available,
// if no output and no action required just repeat the same call
if (MFX_ERR_NONE < sts && syncp)
{
sts = MFX_ERR_NONE;
}

if (MFX_ERR_NONE == sts)
{
sts = m_mfxSession.SyncOperation(syncp, MSDK_DEC_WAIT_INTERVAL);
}
if(pParams->bForceOutput)
sts = m_mfxSession.SyncOperation(syncp, MSDK_DEC_WAIT_INTERVAL);//This means put all I frame into decoder/H264

the ruturn value for sts is MFX_ERR_NULL_PTR(-2).

so could you tell me what the problem ?

b.r

tonny

Petter_L_Intel · ‎05-20-2013

Hi Tonny,

Judging from the SyncOperation return code (-2), either the syncpoint is NULL, bitstream data pointer is NULL, or one of surfaces that was fed to the decoder was not initialized correctly.

The issue could potentially also be related to your memory allocation implementation (via mfxFrameAllocator) if you are using that approach.

Can you please compare your code with the Media SDK sample code (or Media SDK tutorial code) to make sure the pipeline has been setup properly.

Regards,
Petter

tonny · ‎05-21-2013

Hi Petter

Thanks for your helps.

If I put other P frame stream into the decoder, this project is OK. If I do not put other P frame into the decoder and force it output the YUV frame, the error appear.

anyway I will check syncpoint and bitstream data pointer .

b.r

tonny

tonny · ‎05-21-2013

Hi petter

I check the syncpoint is NULL.

I need the function of Media SDK is : When I put one key frame (I frame), I want to output one YUV frame of this I frame immediately.

so could you tell me how to realize it. My Media SDK version: Version 4.0.0000554.52230.

b.r.

tonny

Petter_L_Intel · ‎05-21-2013

Hi Tonny,

Based on your descriptions I do not fully understand the issue you're facing, but,

If you configure you decode pipeline for low latency, as described in the Media SDK samples and tutorial
- AsyncDepth=1
- bitstream DataFlag = MFX_BITSTREAM_COMPLETE_FRAME

then the decoder will output frames as fast as possible. It is also assumed that the input stream have no B frames and Decode Picture Buffer (DPB) set to 1.

This usage requires that you feed one compressed frame into the bitstream at all times.

Regards,
Petter

tonny · ‎05-21-2013

Hi Petter:

Where can I find the Decode Picture Buffer parameter ?

Regards

Tonny

Petter_L_Intel · ‎05-22-2013

Hi Tonny,

Stream DPB is controlled by encoder configuration: "mfxExtCodingOption::MaxDecFrameBuffering = 1", also mentioned earleir in this thread.

Regards,
Petter