Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

h.264 decoder gives two frames latency while decoding a stream (Intel Media SDK)

Svyatoslav_Krasnitsk
3,683 Views

Hi!
We are using Intel Media SDK while HW decoding h.264 video from stream and zero-latency is very important in this case.

We noticed that we get always latency of two frames.
After we send bitstream frame #N to decoder, we get decoded frame #(N-2).
We are using official samples (sample_decode) with d3d11 output, already set AsyncDepth = 1.
The other thing is that this 2-frames latency does not depend on HW decoding used or not in Intel Media SDK.

Note that this behaviour does not depend on system performance and if we provide 1fps at DecodeFrameAsync input, we receive first decoded frame with two seconds latency.


Can you please help us how to obtain real-time encoded frames with no latency.
Thank you very Much.

Svyatoslav Krasnitskiy

0 Kudos
1 Solution
Ramashankar
New Contributor III
3,683 Views

Hi Svyatoslav,

You may try calling DecodeFrameAsync again with NULL bitStream whenever your first DecodeFrameAsync call returns MFX_ERR_MORE_DATA. With this, decoder will flush out whatever decoded frame it has kept in internal buffer. Sometimes second DecodeFrameAsync call may return MFX_ERR_MORE_SURFACE as you are using AsyncDepth = 1, so in that case sync your surfaces and call DecodeFrameAsync again, without reading next input frame. It worked for me.

I am attaching sample code below which i used.

sts = m_pmfxDEC->DecodeFrameAsync(pBitstream, &(m_pCurrentFreeSurface->frame), &pOutSurface, &(m_pCurrentFreeOutputSurface->syncp));
if (sts == MFX_ERR_MORE_DATA)
{
	sts = m_pmfxDEC->DecodeFrameAsync(NULL, &(m_pCurrentFreeSurface->frame), &pOutSurface, &(m_pCurrentFreeOutputSurface->syncp));
	if (sts == MFX_ERR_MORE_SURFACE)
	{
		m_UsedSurfacesPool.AddSurface(m_pCurrentFreeSurface);
		m_pCurrentFreeSurface = NULL;
		SyncFrameSurfaces();
		m_pCurrentFreeSurface = m_FreeSurfacesPool.GetSurface();
		sts = m_pmfxDEC->DecodeFrameAsync(NULL, &(m_pCurrentFreeSurface->frame), &pOutSurface, &(m_pCurrentFreeOutputSurface->syncp));
	}
}

Let me know if this works in you case also.

View solution in original post

0 Kudos
14 Replies
Mark_L_Intel1
Moderator
3,683 Views

Hi Svyatoslav,

What is your GOP? If your encoder generated B frame, the latency is expected.

For the sample_decode, did you try the option "-low_latency"? This option forces the pipeline to complete one frame at a time.

You might also make sure if the latency is introduced by the decoder, my understanding is, the input is a streaming; the output is DirectX, these component could also introduce the latency.

Mark

0 Kudos
Svyatoslav_Krasnitsk
3,679 Views

Hi Mark! Thank you for reply!
Of course, I tried -low_latency and had same effect. Two frames latency. Note that I used 1fps video stream for evaluating the frame delay.
In our project we are using NVENC to encode h.264 stream from another side. The configuration of encoder is a low-latency preset, where there is no B-frames and the GOP is set to infinite. Also we are not using I-frames, only IDRs. I have tried reducing reference frames, set the DPB to 1 etc, has the same two-frames difference. Actually, tried lots of recommendations from this forum.

I figured out that after pushing null bitstream to DecodeFrameAsync, I can catch these two frames from buffer, but then after I restart reading the bitstream, I have to repeat DecodeFrameAsync of empty bitstream again to remove latency. Causes big delays and it is not good approach.

 

We also use libav to decode and it works great, we do not have any buffer-frame-delays. But Intel decoder gives better performance and if we get the 0-latency, we will definitely use Intel decoder.

Can you please help, maybe there is special scenario/approach to use Intel decoder in 0-latency systems?

Thank you!

Mark L. (Intel) wrote:

What is your GOP? If your encoder generated B frame, the latency is expected.

For the sample_decode, did you try the option "-low_latency"? This option forces the pipeline to complete one frame at a time.

0 Kudos
Mark_L_Intel1
Moderator
3,679 Views

As my understanding, the pipeline should like followings:

encoder-->buffer-->network-->buffer-->decoder-->rendering, is this correct?

I want to make sure the latency of 2 frame is from the input of the decoder to the output of the decoder.

Could you try "sample_decode.exe h264 -i <the file without b-frame> -calc_latency? You want to know both the average latency and the frame rate, I would be interesed on the stream without B-frame.

I did the same test on a 1080p stream, and I have some results which shows the frame rate is pretty fast:

1. The frame time I was referring is the processing time, for example, the fps number reported by the sample_decode is 667, one processing frame time is 1.49ms

2. The default async depth should be 4,

3. For 1080p H264 stream, if I set async depth to 1, I could get average latency 2.8ms which is less than 1 process frame time(the reported fps is 328).

As you can see, the decoder processing speed is very fast, it should satisfy your need; also if you only count the process time, it should have less than 1 frame processing time with async depth to 1.

Mark

0 Kudos
Svyatoslav_Krasnitsk
3,679 Views

Mark, thank you very much fir your reply!

This is correct:
encoder-->buffer-->network-->buffer-->decoder-->rendering
I get two frames constant latency here: -->decoder-->.

1. The decoding speed is just great, it gives more than 200fps.

2. With Async depth = 4, I get

With the async depth = 4 I get > 4 frames async buffer latency (this is understood why, np).

3. There is not problem with the performance and decode latency. The question is how to set the decoder to get real-time decoded frame, not the previously decoded, not the buffered.

to decoder I put encoded frames:
start>------------ f1 ------------ f2 ------------ f3 ------------ f4 ----------- ...
from decoder I get decoded frames with 2frames (buffer?) latency:

start>-------------------------------------------- f1 ------------ f2 ------------ f3 ------------ f4 ----------- ...

Is threre any possibility to set decoder to work in real-time? -low_latency flag did not work.

Thank you

 

0 Kudos
Svyatoslav_Krasnitsk
3,679 Views

Guys, any updates? Up...

0 Kudos
Svyatoslav_Krasnitsk
3,683 Views

Any updates?

0 Kudos
Svyatoslav_Krasnitsk
3,683 Views

Any updates? No fix in new version?

0 Kudos
Ramashankar
New Contributor III
3,684 Views

Hi Svyatoslav,

You may try calling DecodeFrameAsync again with NULL bitStream whenever your first DecodeFrameAsync call returns MFX_ERR_MORE_DATA. With this, decoder will flush out whatever decoded frame it has kept in internal buffer. Sometimes second DecodeFrameAsync call may return MFX_ERR_MORE_SURFACE as you are using AsyncDepth = 1, so in that case sync your surfaces and call DecodeFrameAsync again, without reading next input frame. It worked for me.

I am attaching sample code below which i used.

sts = m_pmfxDEC->DecodeFrameAsync(pBitstream, &(m_pCurrentFreeSurface->frame), &pOutSurface, &(m_pCurrentFreeOutputSurface->syncp));
if (sts == MFX_ERR_MORE_DATA)
{
	sts = m_pmfxDEC->DecodeFrameAsync(NULL, &(m_pCurrentFreeSurface->frame), &pOutSurface, &(m_pCurrentFreeOutputSurface->syncp));
	if (sts == MFX_ERR_MORE_SURFACE)
	{
		m_UsedSurfacesPool.AddSurface(m_pCurrentFreeSurface);
		m_pCurrentFreeSurface = NULL;
		SyncFrameSurfaces();
		m_pCurrentFreeSurface = m_FreeSurfacesPool.GetSurface();
		sts = m_pmfxDEC->DecodeFrameAsync(NULL, &(m_pCurrentFreeSurface->frame), &pOutSurface, &(m_pCurrentFreeOutputSurface->syncp));
	}
}

Let me know if this works in you case also.

0 Kudos
Svyatoslav_Krasnitsk
3,683 Views

Ramashankar wrote:

You may try calling DecodeFrameAsync again with NULL bitStream whenever your first DecodeFrameAsync call returns MFX_ERR_MORE_DATA. Let me know if this works in you case also.

Thank you for the reply! I used this approach but I suppose it is kind of cheating the decoder. I got this working, but decoder sometimes gave me artifacts and reducing performance. Maybe there is something wrong with my code, I will inform you later if I get a good result, for now I am using libav (codec) for getting best real-time (1 in, 1 our) performance.

Thank you!

0 Kudos
Ramashankar
New Contributor III
3,683 Views

Hi,

 I used this approach but I suppose it is kind of cheating the decoder.

yes I also feel same as we are notifying decoder kind of 'End Of Stream' after every frame, but somehow this was only way I could make it work for me :)

I didn't observe any side effect of this at my end, but yes a low performance in this scenario is expected i think.

0 Kudos
Artem_S_Intel
Employee
3,683 Views

Just seeing this report, have you guys tried next:

Set Async_Depth to 1, enabled DecodedOrder - this will force immediate output without hacks with NULL frame input, but if there is something in bitstream required buffering it can cause issues like corruptions, etc.

 

0 Kudos
Dmitry_E_Intel
Employee
3,683 Views

Hi Svyatoslav,

There are three parts which legally can introduce latency in MediaSDK decoder:

  • Bitstream parsing: when app calls DecodeFrameAsync with some amount of bistream data, decoder doesn't know whether it's a full frame or not, so decoder starts real decode process only when app sends the beginning of a next frame. If you app receives data from some splitter and you're sure you have a one full frame, then you should set MFX_BITSTREAM_COMPLETE_FRAME flag in mfxBistream object. 
  • "AsyncDepth" latency - this has already been discussed. Just set AsyncDepth to 1.
  • Reordering: According to AVC spec a decoder doesn't have to return a decoded surface immediately for displaying. It's true even in absence of B frames - decoder doesn't know in advance that later it won't meet B frames, reordering might present. So by default decoder returns the first decoded frame after DPB is full. It's possible to tune an encoder to produce a low latency bistream. Here are the rules:
    • sps.pic_order_cnt_type = 2
    • or if SPS.pic_order_cnt_type = 0 then
      • Set VUI.max_dec_frame_buffering >= Number of reference frames used by encoder (e.g. 1)
      • or SEI.pic_timing.dpb_output_delay = 0

 

If you're sure the content you decode doesn't have re-ordered frames you can use DecodedOrder API from MediaSDK manual : 

DecodedOrder For AVC and HEVC, used to instruct the decoder to return output frames in the decoded order. Must be zero for all other decoders.

When enabled, correctness of mfxFrameData::TimeStamp and FrameOrder for output surface is not guaranteed, the application should ignore them.

 

It will force decoder to avoid any surface buffering.

 

Regard,

Dmitry

0 Kudos
Beese__Erin
Beginner
3,683 Views

Hi Artem,

I've been able to successfully eliminate the internal frame caching by setting DecodedOrder to true on some machines. However, the manual says this function is deprecated & I'm finding that it doesn't always work (MFX_ERR_UNSUPPORTED while initializing decoder). Are there any alternatives?

The following post seems to indicate there are no alternatives at this time:

https://software.intel.com/en-us/forums/intel-media-sdk/topic/706986

 

Erin

0 Kudos
Artem_S_Intel
Employee
3,683 Views

Hi Erin, can you share more info in what cases you are seeing MFX_ERR_UNSUPPORTED?

0 Kudos
Reply