Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Losing output frames after IDR on H.265 decode...

mmulh
Beginner
1,704 Views

I've got an H.265 video stream that is coming out of a CCTV camera, and performing a decode.  I'm getting unexpected errors following an intermediate IDR frame, and end up losing a few output frames.  Also I think the output frames following the errors seems to be marked somehow, because a further reencode (transcode) results in a fairly sizeable output NALs.  I'm not sure if I'm doing something wrong, or a problem in the oneVPL stack.

Hardware: i5-6500 CPU

oneVPL: version: 1.35

I'm using the legacy interface in oneVPL since I need to support pre-gen11 hardware, and I'm using the D3D11/VA hardware buffers / acceleration.  I've been running this on Windows, haven't confirmed on Linux.

The first 3 frames that get fed into the decoder don't result in an output frame - as expected, and the 4th frame in results in the 1st frame coming as output.

  decoding NAL 0 - len 92 (VPS/SPS/PPS packets to initialize decoder)

    - results in MFX_ERR_MORE_DATA

  decoding  NAL 1 - len 58332 (initial IDR Frame)

    - results in MFX_ERR_MORE_SURFACE

    - decode again with new surface results in MFX_ERR_MORE_DATA

  decoding NAL 2 - len 49 (P-Frame)

    - results in MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA as above

  decoding NAL 3 - len 49 (P-Frame)

    - results in MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA as above

  decoding NAL 4 - len 49 (P-Frame)

    - get NAL 1 frame result

All of this is fine, the problem comes after the next I-Frame (in this case, the camera has a GOP size of 60), after the previous surfaces get flushed:

    decoding NAL 61 - len 58361 (subsequent I-Frame)

    - get NAL 58 surface

  decoding NAL 62 - len 55 (P-Frame)

    - get NAL 59 surface

  decoding NAL 63 - len 49 (P-Frame)

    - get NAL 60 surface

  decoding NAL 64 - len 49 (P-Frame)

    - results in MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA as above

    - I'M EXPECTING TO GET NAL 61 returned here (but nothing)
   decoding NAL 65 - len 49 (P-Frame)

    - results in MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA as above

    - I'M EXPECTING TO GET NAL 62 returned here (but nothing)

   decoding NAL 66 - len 51 (P-Frame)

    - get NAL 61 frame result (this was the subsequent I-Frame)

    decoding NAL 67 - len 49 (P-Frame)
    - get NAL 64 frame result (FRAMES 62/63 ARE LOST).

I believe the decode is fairly straight forward.

mfxFrameSurface1* decodeNal()
{
    printf("decoding %llu - len %zu\n", nalCount, nalLen);
    mfxBitstream bitstream = {};
    bitstream.DataFlag = MFX_BITSTREAM_COMPLETE_FRAME;
    bitstream.TimeStamp = nalCount++;
    bitstream.Data = (mfxU8*)nal;
    bitstream.DataLength = (mfxU32)nalLen;
    bitstream.MaxLength = (mfxU32)nalLen;

    mfxSyncPoint syncPoint;
    while (true)
    {
      mfxFrameSurface1* inputFrame = getDecodeFrame();
      mfxFrameSurface1* outputFrame = nullptr;

      mfxStatus sts = MFXVideoDECODE_DecodeFrameAsync(_decodeSession, &bitstream, inputFrame, &outputFrame, &syncPoint);
      switch (sts)
      {
      case MFX_ERR_NONE:
          assert(outputFrame);
          incLock(outputFrame);
          MFXVideoCORE_SyncOperation(_decodeSession, syncPoint, 1000);
          return outputFrame;
        case MFX_WRN_DEVICE_BUSY:
          msecSleep(1); // wait if device is busy
          continue;  // go back to decode again
        case MFX_ERR_MORE_DATA: // fine - just didn't get an output frame
          printf("MFX_ERR_MORE_DATA\n");
          assert(!outputFrame);
          return nullptr;  // nothing output - wait for another NAL
        case MFX_ERR_MORE_SURFACE: // need to feed another inputFrame
          printf("MFX_ERR_MORE_SURFACE\n");
          break;  // go back to decode again
        default:
          printf("error %i\n", (int)sts);
          assert(false);
        }
    }
}

 

I've attached my test program and test input

0 Kudos
16 Replies
RemyaP_Intel
Moderator
1,663 Views

Hi, 


Thank you for posting in Intel Communities. 


Could you please provide answers to the below questions to assist you better?


- Are you using a real-time video to decode?

- I have observed that you are using an old version of VPL. Could you please update VPL to the latest version and try again?

- Are you using the sample decode code available to decode your input video?

- The input video you have shared is getting stuck in between. Could you please try with a different video and let us know the results?



Regards,

Remya Premdas 



0 Kudos
mmulh
Beginner
1,636 Views

I've updated all of my stuff.  I used https://www.intel.com/content/www/us/en/developer/tools/oneapi/onevpl.html and downloaded the binaries and installed - my SDK version still lists as 1.35 - I'm assuming because I've got an older processor which is no longer being developed against.  I took my test to a newer machine - an i5-1135G7, running oneVPL version 2.5, it is exhibiting the same behavior.

I pulled the latest oneVPL SDK from github (2023.2.1), and compiled with this version.

This video is H.265 surveillance video, which was archived into an MP4 - I just extracted the enough of the raw video to show the problem.  I have other H.265 surveillance video, and it does NOT seem to have this issue.  The NAL type on the problem video is 20 for the IDR frame, whereas the fine video has a NAL type of 19 for the IDR frame, though I doubt this makes a difference.

The test code I provided is my own (extracted from my code base), not the sample decode code.

I'm not sure what you mean by the video is getting stuck in-between ?  I only provided enough video to show the issue, and it seems like this particular "flavor" of H.265 is causing issues.  My "guess" is that it's something in this particular H.265 video that's causing issues.  It is valid H.265, it plays just fine in VLC, and it decodes fine with Nvidia's NVDEC.

0 Kudos
RemyaP_Intel
Moderator
1,598 Views

Hi,


We also tried playing the video in the VLC player. Not sure why the video is not playing smoothly. 

But we were able to decode the video using the sample-decode code. Have you tried the same? If not, can you please try decoding the H265 input video using sample decode and let us know the result?


Regards,

Remya Premdas


0 Kudos
mmulh
Beginner
1,585 Views

I've tested now with the legacy-decode from github, and modified it slightly to match what I'm doing - for ReadEncodedStream() I'm only filling in a single NAL unit, setting the MFX_BITSTREAM_COMPLETE_FRAME flag. and incrementing the TimeStamp value.  Modified code attached.

In the main program, I'm loading the first 3 NALs (VPS/SPS/PPS) before calling the DecodeHeader.  NOTE: I added loading the first IDR frame before calling DecodeHeader to make the sw implementation work - but it doesn't change the behavior of the hw implementation (which ONLY requires the first 3 NALs).

Running the code with the sw implementation, things work fine, the frames start coming out a little later related to how many input frames I have to feed in compared to the hw implementation, but I get all of my output frames (verified by the matching timestamps) as expected.

Running the hw implementation, it still acts wrong, but masks the errors that I get with my implementation which used D3D11 output buffers.

decoding frame #63 - len 58361 -- this is the intermediate IDR frame
decoded frame 60
decoding frame #64 - len 55
decoded frame 61
decoding frame #65 - len 104
decoded frame 62
decoding frame #66 - len 153
decoded frame 63                          -- decoded IDR frame (fine).
decoding frame #67 - len 49
decoded frame 66                          -- here I should be getting frame 64 !! (but no MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA errors).
decoding frame #68 - len 51
decoded frame 66                          -- not frame 65 !!
decoding frame #69 - len 49
decoded frame 66                          -- not frame 66 !!
decoding frame #70 - len 49
decoded frame 67

0 Kudos
RemyaP_Intel
Moderator
1,532 Views

Hi,

 

Sorry for the delay. We were able to reproduce the issue which you have mentioned. 

 

But, on the other hand, we didn't observe the same behavior when we decoded the cars_320x240.h265(oneVPL GitHub sample video) using the actual legacy-decode.cpp and the legacy-decode.cpp file which you have shared. 

 

Also, we tried to decode the tmp.h265 file which you have shared with the actual legacy-decode.cpp and observed that it's throwing MFX_ERR_MORE_DATA at frame 66.

 

Could you please try decoding a different .h265 file and let us know if the same pattern is happening or not?

 

Regards,

Remya Premdas

 

0 Kudos
mmulh
Beginner
1,524 Views

I had tried different .h265 files, and they worked just fine.  The issue seems to be with this particular "flavor" of .h265.  I extracted this small chunk of video from a 12 hour .MP4 of a CCTV camera recording (1.37 GB).  I've currently got a query out to see if we can get the brand of camera.

I know that there are lots of variants of the particular codec standards (different flags, etc. specified in the VPS/SPS/PPS) and I'm guessing that there is something in this particular .H265 flavor that is causing issue.

 

0 Kudos
RemyaP_Intel
Moderator
1,498 Views

Hi,


Thanks for sharing your observations. Have you observed any differences in both the .h265 file? Also, to compare, could you please share the latest .h265 file which you have used with us.


Regards,

Remya Premdas


0 Kudos
mmulh
Beginner
1,487 Views

I just used some random H265 files that I had, and they worked fine - unrelated to the CCTV footage from this particular camera - I'm assuming that this camera should always create problem videos - since it probably defines a consistent VPS/SPS/PPS sequence (this camera isn't mine, it's from a potential customer).  The problematic MP4 that I have is too large (1.37GB) to share here, so do you have a mechanism for sending larger files.

0 Kudos
RemyaP_Intel
Moderator
1,463 Views

Hi,


I will share the detailed steps separately in a private message on how to transfer the files. Please follow the steps and share the file with us.


Regards,

Remya


0 Kudos
mmulh
Beginner
1,452 Views
0 Kudos
RemyaP_Intel
Moderator
1,426 Views

Hi,


Thanks. We have received the video file. Our internal team is looking into this. We will get back to you with an update.


Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Moderator
1,265 Views

Hi,


We have observed that "decoded frame 66" comes from the below print statement:

printf(" decoded frame %llu\n", decSurfaceOut->Data.TimeStamp);


So, it is not decoded frame number(framenum) but frame timestamp, which is integer type. The timestamp comes from NalUnit PTS which is converted from double to integer. Which is why you see the repeated integer timestamp.


Hope this resolves your issue. Let us know if you need any further clarification.


--

Regards,

Remya Premdas





0 Kudos
mmulh
Beginner
1,258 Views

The Timestamp should be copied from the original input compressed H.265 packet timestamp, which in the test is an incrementing integer as we input the frames [DTS] (in real life it's an RTP timestamp), and is not tied to the content of the H.265 packets.  Note - this is surveillance video, so DTS will match PTS since no B-Frames exist.

In the sample, frame 63 was an IDR frame, then frame 64 & 65 got lost.  Using the legacy-decode, frame 66 got substituted in for frames 64 and 65.  Using my test code (more complete hw implementation), it got the MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA errors on these 2 frames (so obviously the legacy-decode masked these errors somehow).  Using the sw implementation there were no issues.  The errors are consistent immediately following the IDR frame.

0 Kudos
RemyaP_Intel
Moderator
1,219 Views

Hi,


Thanks for getting back to us. We have checked internally with our team and here is our observation.


The repetition of 66th frame is expected behavior. There is no error in the whole decode pipeline.

The following is the detail:

  1. "legacy-decode.cpp" doesn't select order in the decode parameter during initialization. So it is display order by default.
  2. Although this sample clip hasn't B frame, but its sps_max_dec_pic_buffering_minus1[0] and sps_max_num_reorder_pics[0] are 3. So VPL/MSDK decoder has to save some frames (3) in DPB before output for possible reordering of display order.
  3. After the 63th frame (it is the 60th frame in fact, IDR frame) is submitted, the subsequent P frames (64, 65, 66) won't be submitted to HW immediately by MFXVideoDECODE_DecodeFrameAsync until all previous 3 P frames before submitted IDR frame. The 3 calling of MFXVideoDECODE_DecodeFrameAsync just returns MFX_ERR_NONE and the decoded output. But one more NAL unit/frame is added to the end of mfxBitstream data after MFXVideoDECODE_DecodeFrameAsync call. The previous Nal unit isn't overwritten and is valid still but TimeStamp is accumulated to 66. This is the root cause.
  4. The above behavior doesn't cause the decode output wrong. It just causes some confused logs.
    1. if use decoded order, no repeated the 66th frame log by the following because the decode output needn't be buffered.
    2. Place "mfxDecParams.mfx.DecodedOrder = 1;" before "sts = MFXVideoDECODE_Init(session, &mfxDecParams);".
  5. About MFX_ERR_MORE_SURFACE / MFX_ERR_MORE_DATA, they aren't real errors in fact. Bitstream buffer is fixed size like 10MB normally. So if VPL runtime finds no enough data for one frame decoding, it will return MFX_ERR_MORE_DATA to ask the app to input more data. On the other hands, if current bs buffer includes multiple frames, it will return MFX_ERR_MORE_SURFACE to ask more surface input.
  6. For legacy-decode, no MFX_ERR_MORE_SURFACE is returned because of the following reasons:
    1. it is sync mode (the simplest mode) and not async mode. 
    2. Its bs buffer is based on NAL unit border.
    3. It use system memory surface but Intel GPU only uses GPU memory surface to decode. So VPL runtime will maintain the internal surfaces and copy decoded output to the app provided sys surface. 
    4. So no MFX_ERR_MORE_SURFACE is needed.
    5. If app use GPU memory surface as the input of MFXVideoDECODE_DecodeFrameAsync, MFX_ERR_MORE_SURFACE will be returned because some decoded frame cannot be released immediately as the reference frames.


Let me know you if you require more details.


Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Moderator
1,146 Views

Hi,


We haven't received any update from your side. Is your issue resolved?



Regards,

Remya Premdas


0 Kudos
RemyaP_Intel
Moderator
1,117 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Remya Premdas


0 Kudos
Reply