Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

DecodeFrameAsync seemingly starved of data by ReadNextFrame()

Robby_S
New Contributor I
723 Views

Hello,

I just realized that my MSDK code is running slower than it should. My code is basically a transcoder with some object detection between the H.264 decoder and H.264 encoder. The decoder resolution is 1920x1080; the encoder 640x480.

Right now, it looks like the decoder is slowing things down. So I added some code to measure the performance, like this:

mfxStatus SurvlChannel::DecodeOneFrame(ExtendedSurface *pExtSurface)
{
    MSDK_CHECK_POINTER(pExtSurface,  MFX_ERR_NULL_PTR);

    timespec t0, t1;  //requires GLIBC_2.17+ 
    long tdiff;

    mfxStatus sts = MFX_ERR_MORE_SURFACE;
    mfxFrameSurface1    *pmfxSurface = NULL;
    mfxBitstream        *pInBitstream = &m_mfxDecBS;
    pExtSurface->pSurface = NULL;
    mfxU32 i = 0;

    msdk_printf(MSDK_STRING("Channel %u frame %d DecodeOneFrame() entering loop ... \n"), m_nChanID, m_nProcessedFramesNum);
    while (MFX_ERR_MORE_DATA == sts || MFX_ERR_MORE_SURFACE == sts || MFX_ERR_NONE < sts) {
        if (MFX_WRN_DEVICE_BUSY == sts) {
            MSDK_SLEEP(TIME_TO_SLEEP); // just wait and then repeat the same call to DecodeFrameAsync
        }
        else if (MFX_ERR_MORE_DATA == sts) {
            clock_gettime(CLOCK_MONOTONIC, &t0);
            sts = m_pFileReader->ReadNextFrame(pInBitstream); // read more data to input bit stream
            MSDK_BREAK_ON_ERROR(sts);
            clock_gettime(CLOCK_MONOTONIC, &t1);
            tdiff = timespec_diff_ns(t0, t1);
            msdk_printf(MSDK_STRING("Channel %u frame %d decoder ReadNextFrame() %ld (ns)\n"), m_nChanID, m_nProcessedFramesNum, tdiff);
        }
        else if (MFX_ERR_MORE_SURFACE == sts) {
            // find new working surface
            clock_gettime(CLOCK_MONOTONIC, &t0);
            pmfxSurface = GetFreeSurfaceDec();
            while (NULL == pmfxSurface) {
                pmfxSurface = GetFreeSurfaceDec();
            }
            MSDK_CHECK_POINTER(pmfxSurface, MFX_ERR_MEMORY_ALLOC); // return an error if a free surface wasn't found
            clock_gettime(CLOCK_MONOTONIC, &t1);
            tdiff = timespec_diff_ns(t0, t1);
            msdk_printf(MSDK_STRING("Channel %u frame %d decoder GetFreeSurfaceDec() %ld (ns)\n"), m_nChanID, m_nProcessedFramesNum, tdiff);
        }

        clock_gettime(CLOCK_MONOTONIC, &t0);
        sts = m_pmfxDEC->DecodeFrameAsync(pInBitstream, pmfxSurface, &pExtSurface->pSurface, &pExtSurface->Syncp);
        clock_gettime(CLOCK_MONOTONIC, &t1);
        tdiff = timespec_diff_ns(t0, t1);
        msdk_printf(MSDK_STRING("Channel %u frame %d decoder DecodeFrameAsync() %ld (ns)\n"), m_nChanID, m_nProcessedFramesNum, tdiff);

        // ignore warnings if output is available,
        if (MFX_ERR_NONE < sts && pExtSurface->Syncp) {
            sts = MFX_ERR_NONE;
        }
    } //while processing

    return sts;
}

And the output looks like this:

Channel 0 frame 51 DecodeOneFrame() entering loop ... 
Channel 0 frame 51 decoder GetFreeSurfaceDec() 701 (ns)
Channel 0 frame 51 decoder DecodeFrameAsync() 45474 (ns)
Channel 0 frame 51 decoder ReadNextFrame() 42505 (ns)
Channel 0 frame 51 decoder DecodeFrameAsync() 255923 (ns)
Channel 0 frame 51 decoder ReadNextFrame() 31038 (ns)
Channel 0 frame 51 decoder DecodeFrameAsync() 144357 (ns)
Channel 0 frame 51 decoder ReadNextFrame() 14646 (ns)
Channel 0 frame 51 decoder DecodeFrameAsync() 129694 (ns)

We can see that it takes several iterations of reading and decoding to really finish a frame. I have set the input bit-stream's MaxLength to 1024*1920*1080, but that does not solve the problem.

Why is that? Is there something else I can do to make each frame finish in one iteration?

Thanks,

Robby

0 Kudos
4 Replies
Bjoern_B_Intel
Employee
723 Views

Hi Robby,

I don’t see anything wrong with the code here. Did you try to debug the InitMfxBitstream() and ReadNextFrame() function and check the number of bytes actually allocated and read? Your pInBitstream pointer refers to the data structure of interest here. Print out the info and check. Keep in mind that the size values are 32-bit and not 64-bit.

Thanks & Best,

Bjoern

0 Kudos
Alexey_F_Intel
Employee
723 Views

it can be 2 reasons or their combinations leading to decoding delay.

1. splitter/file reader provides parts of frame and it requires several iterations. Best if your splitter can supply full frame at the time. If it as an elementary stream and seek a slice start code position it maybe you got multi slice coded input stream and decoder returm "more data" error to complete full picture. Worst case if data reader provides small fixed size chuncks. like 1KB.

2. decoder frame buffering to reoreder output frames. If a seguence with Bpyramid IdrPB1b2b3 (in decode order) than decoder will decode 3 frames = P & B1 & b2 and only after that will return you displayable b2

Alexey

0 Kudos
Robby_S
New Contributor I
723 Views

Hi Bjoern, Alexey,

I re-used some code from the MSDK samples. The internal buffer size for CH264FrameReader was hard-coded, like this:

mfxStatus CH264FrameReader::Init(const msdk_char *strFileName)
{
    mfxStatus sts = MFX_ERR_NONE;

    sts = CSmplBitstreamReader::Init(strFileName);
    if (sts != MFX_ERR_NONE)
        return sts;

    m_isEndOfStream = false;
    m_processedBS = NULL;

    m_originalBS.reset(new mfxBitstream());
    sts = InitMfxBitstream(m_originalBS.get(), 1024 * 1024);  // <---- internal buffer size hard-coded
    if (sts != MFX_ERR_NONE)
        return sts;

    m_pNALSplitter.reset(new ProtectedLibrary::AVC_Spl());

    m_frame = 0;
    m_plainBuffer = 0;
    m_plainBufferSize = 0;

    return sts;
}

I changed the size to be 16*1920*1080, the same size as the mfxBitstream used in my code (I originally set it to 1024*1920*1080, and realized that was too much). The decoder performance got slightly better, but is still a bit slow. Maybe as Alexey suggested, something else is also happening.

Thanks,

Robby

0 Kudos
Alexey_F_Intel
Employee
723 Views

your buffer is large. problem might be in NAL Splitter - if it reads by NALs and do not combine multiple slices to a single chunck of data when send it to decoder. Can you check how many slices in input bitstream. You can use Video Pro Analyzer for this (eval version if fine) in Coding Flow mode to see picture split to slices https://software.intel.com/en-us/intel-video-pro-analyzer

 

0 Kudos
Reply