Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Surface sharing between OPENCL and DirectX

Manish_K_
Beginner
327 Views

I am working on Decode-OPENCL-Encode pipeline on intel processor. There is a sample code provide by intel for media interop.

If we look at the DecodeOneFrame() function below: 

    // decode next frame and put result to output surface
    mfxStatus CDecodingPipeline::DecodeOneFrame(int Width, int Height, IDirect3DSurface9 *pDstSurface, IDirect3DDevice9* pd3dDevice)

    {
        mfxStatus stsOut = MFX_ERR_NONE;
    // check if previouse submitted task exists
    if(m_Tasks[m_TaskIndex].m_DecodeSync || m_Tasks[m_TaskIndex].m_OCLSync)
    {// wait task is finished and copy result texture to back buffer
        mfxStatus   sts = MFX_ERR_NONE;
        mfxFrameSurface1_OCL*   pOutSurface = NULL; // output surface.
        //wait the previous submitted tasks
        if(m_Tasks[m_TaskIndex].m_DecodeSync)
        {
            sts = m_mfxSession.SyncOperation(m_Tasks[m_TaskIndex].m_DecodeSync, MSDK_DEC_WAIT_INTERVAL);
            MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
            pOutSurface = m_Tasks[m_TaskIndex].m_pDecodeOutSurface;
        }
        if(m_Tasks[m_TaskIndex].m_OCLSync)
        {
            sts = m_mfxSession.SyncOperation(m_Tasks[m_TaskIndex].m_OCLSync, MSDK_VPP_WAIT_INTERVAL);
            MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
            pOutSurface = m_Tasks[m_TaskIndex].m_pOCLOutSurface;
        }
        if(pOutSurface)
        {/* copy YUV texture to screen */

            HRESULT hr;

            IDirect3DSurface9* pSrcSurface = (IDirect3DSurface9*)(pOutSurface->Data.MemId);
            assert(pDstSurface && pSrcSurface);
            if(pSrcSurface && pDstSurface)
            {
                RECT    r;
                r.left = 0;
                r.top = 0;
                r.right = min(Width,m_mfxDecodeVideoParams.vpp.In.Width);
                r.bottom = min(Height,m_mfxDecodeVideoParams.vpp.In.Height);

                r.right -= r.right&1;
                r.bottom -= r.bottom&1;

                V(pd3dDevice->StretchRect(pSrcSurface, &r, pDstSurface, &r,D3DTEXF_POINT));
            }

        }
        if(m_Tasks[m_TaskIndex].m_pDecodeOutSurface && m_Tasks[m_TaskIndex].m_pDecodeOutSurface->Data.Locked)
          _InterlockedDecrement16((short*)&m_Tasks[m_TaskIndex].m_pDecodeOutSurface->Data.Locked);
        if(m_Tasks[m_TaskIndex].m_pOCLOutSurface && m_Tasks[m_TaskIndex].m_pOCLOutSurface->Data.Locked)
            _InterlockedDecrement16((short*)&m_Tasks[m_TaskIndex].m_pOCLOutSurface->Data.Locked);

    }

    // clear sync task for further using
    m_Tasks[m_TaskIndex].m_OCLSync = 0;
    m_Tasks[m_TaskIndex].m_pOCLOutSurface = 0;
    m_Tasks[m_TaskIndex].m_DecodeSync = 0;
    m_Tasks[m_TaskIndex].m_pDecodeOutSurface = 0;

 

    if(m_DECODEFlag)
    {// feed decoder
        mfxSyncPoint        DecodeSyncPoint = 0;
        static mfxU16      nDecoderSurfIndex = 0; // index of free surface
        mfxStatus   sts = MFX_ERR_NONE;
        m_pmfxDecodeSurfaceLast = NULL; // reset curretn decoder surface to get new one from Decoder
        while(MFX_ERR_NONE <= sts || MFX_ERR_MORE_DATA == sts || MFX_ERR_MORE_SURFACE == sts || MFX_WRN_DEVICE_BUSY == sts)
        {// loop until decoder report that it get request for new frame
            if (MFX_WRN_DEVICE_BUSY == sts)
            {
                Sleep(1); // just wait and then repeat the same call to DecodeFrameAsync
            }
            else if (MFX_ERR_MORE_DATA == sts)
            { // read more data to input bit stream
                sts = m_FileReader.ReadNextFrame(&m_mfxBS);
                MSDK_BREAK_ON_ERROR(sts);
            }
            else if (MFX_ERR_MORE_SURFACE == sts || MFX_ERR_NONE == sts)
            {// find new working-output surface in m_pmfxDecodeSurfaces
                //nDecoderSurfIndex = 0;
                nDecoderSurfIndex = GetFreeSurfaceIndex(m_pmfxDecodeSurfaces, m_mfxDecoderResponse.NumFrameActual,nDecoderSurfIndex);
                if (MSDK_INVALID_SURF_IDX == nDecoderSurfIndex)
                {
                    return MFX_ERR_MEMORY_ALLOC;
                }
            }

            // send request to decoder
            sts = m_pmfxDEC->DecodeFrameAsync(
                &m_mfxBS,
                (mfxFrameSurface1*)&(m_pmfxDecodeSurfaces[nDecoderSurfIndex]),
                (mfxFrameSurface1**)&m_pmfxDecodeSurfaceLast,
                &DecodeSyncPoint);
            // ignore warnings if output is available,
            // if no output and no action required just repeat the same call
            if (MFX_ERR_NONE < sts && DecodeSyncPoint)
            {
                sts = MFX_ERR_NONE;
            }

            if (MFX_ERR_NONE == sts)
            {// decoder return sync point then fill the curretn task nad switch to OCL Plugin feeding
                m_Tasks[m_TaskIndex].m_DecodeSync = DecodeSyncPoint;
                m_Tasks[m_TaskIndex].m_pDecodeOutSurface = m_pmfxDecodeSurfaceLast;
                // look for output process
                if(m_Tasks[m_TaskIndex].m_pDecodeOutSurface)
                    _InterlockedIncrement16((short*)&m_Tasks[m_TaskIndex].m_pDecodeOutSurface->Data.Locked);
                break;
            }
        }
        if(MFX_ERR_NONE != sts)
        {
            printf("ERROR: Decoder returns error %d!\n",sts);
            stsOut = sts;
        }
    }//if(m_DECODEFlag)

    if(m_pOCLPlugin && m_pOCLPlugin->m_OCLFlag && begin)
    {// OPENCL part
        mfxSyncPoint        OCLSyncPoint = 0;
        mfxStatus   sts = MFX_ERR_NONE;
        // get index for output surface for OCL plugin
        mfxU16 nOCLSurfIndex = GetFreeSurfaceIndex(m_pmfxOCLSurfaces, m_mfxOCLResponse.NumFrameActual);
        MSDK_CHECK_ERROR(nOCLSurfIndex, MSDK_INVALID_SURF_IDX, MFX_ERR_MEMORY_ALLOC);
        mfxHDL pOutSurf = &m_pmfxOCLSurfaces[nOCLSurfIndex];
        mfxHDL inp = m_pmfxDecodeSurfaceLast;

        // OCL filter
        for(;;)
        {
            sts = MFXVideoUSER_ProcessFrameAsync(m_mfxSession, &inp, 1, &pOutSurf, 1, &OCLSyncPoint);

            if (MFX_WRN_DEVICE_BUSY == sts)
            {
                Sleep(1); // just wait and then repeat the same call
            }
            else
            {
                break;
            }
        }

        // ignore warnings if output is available,
        if (MFX_ERR_NONE < sts && OCLSyncPoint)
        {
            sts = MFX_ERR_NONE;
        }

        if(MFX_ERR_NONE!=sts)
        {
            printf("ERROR: OpenCL filter return error %d!\n",sts);
            return sts;
        }

        {
            m_Tasks[m_TaskIndex].m_OCLSync = OCLSyncPoint;
            m_Tasks[m_TaskIndex].m_pOCLOutSurface = &m_pmfxOCLSurfaces[nOCLSurfIndex];
            // look for output process
            _InterlockedIncrement16((short*)&m_Tasks[m_TaskIndex].m_pOCLOutSurface->Data.Locked);
        }
    }

    // increase task index to point to next task.
    m_TaskIndex = (m_TaskIndex+1)%SYNC_BUF_SIZE;
    return stsOut;
    }//CDecodingPipeline::DecodeOneFrame
In the first go,
Decoder decodes the first frame and moves on to OPENCL plugin. Decoder doesn't have completed the whole task.  
In the opencl plugin

      if(m_pOCLPlugin && m_pOCLPlugin->m_OCLFlag)

opencl surface takes the decoded surface pointer and starts working on it?

Till now both decoder and opencl has started working on the their respective surfaces but we don't know if anyone has get completed.

My simple doubt: How can opencl start working if the whole task submitted to decoder has not get completed yet. Shouldn't there be a delay of one whole task. Means we should start decoding only after after the first task submitted to decoder has completed.

My assumptions: Decoder starts parallely decoding couple of frames after decoding the first frame. OPENCL moves on after filtering one by one frame.
So as per my understanding I will modify the code as below:

    // decode next frame and put result to output surface
    mfxStatus CDecodingPipeline::DecodeOneFrame(int Width, int Height, IDirect3DSurface9 *pDstSurface, IDirect3DDevice9* pd3dDevice)

    {
        mfxStatus stsOut = MFX_ERR_NONE;
        int begin =0; //flag to indicate start of opencl
    // check if previouse submitted task exists
    if(m_Tasks[m_TaskIndex].m_DecodeSync || m_Tasks[m_TaskIndex].m_OCLSync)
    {// wait task is finished and copy result texture to back buffer
        mfxStatus   sts = MFX_ERR_NONE;
        mfxFrameSurface1_OCL*   pOutSurface = NULL; // output surface.
        //wait the previous submitted tasks
        if(m_Tasks[m_TaskIndex].m_DecodeSync)
        {
            sts = m_mfxSession.SyncOperation(m_Tasks[m_TaskIndex].m_DecodeSync, MSDK_DEC_WAIT_INTERVAL);
            MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
            pOutSurface = m_Tasks[m_TaskIndex].m_pDecodeOutSurface;
        }
        if(m_Tasks[m_TaskIndex].m_OCLSync)
        {
            sts = m_mfxSession.SyncOperation(m_Tasks[m_TaskIndex].m_OCLSync, MSDK_VPP_WAIT_INTERVAL);
            MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
            pOutSurface = m_Tasks[m_TaskIndex].m_pOCLOutSurface;
            begin =1; // start opencl processing. now decoder has completed the submitted task.
        }
        if(pOutSurface)
        {/* copy YUV texture to screen */

            HRESULT hr;

            IDirect3DSurface9* pSrcSurface = (IDirect3DSurface9*)(pOutSurface->Data.MemId);
            assert(pDstSurface && pSrcSurface);
            if(pSrcSurface && pDstSurface)
            {
                RECT    r;
                r.left = 0;
                r.top = 0;
                r.right = min(Width,m_mfxDecodeVideoParams.vpp.In.Width);
                r.bottom = min(Height,m_mfxDecodeVideoParams.vpp.In.Height);

                r.right -= r.right&1;
                r.bottom -= r.bottom&1;

                V(pd3dDevice->StretchRect(pSrcSurface, &r, pDstSurface, &r,D3DTEXF_POINT));
            }

        }
        if(m_Tasks[m_TaskIndex].m_pDecodeOutSurface && m_Tasks[m_TaskIndex].m_pDecodeOutSurface->Data.Locked)
          _InterlockedDecrement16((short*)&m_Tasks[m_TaskIndex].m_pDecodeOutSurface->Data.Locked);
        if(m_Tasks[m_TaskIndex].m_pOCLOutSurface && m_Tasks[m_TaskIndex].m_pOCLOutSurface->Data.Locked)
            _InterlockedDecrement16((short*)&m_Tasks[m_TaskIndex].m_pOCLOutSurface->Data.Locked);

    }

    // clear sync task for further using
    m_Tasks[m_TaskIndex].m_OCLSync = 0;
    m_Tasks[m_TaskIndex].m_pOCLOutSurface = 0;
    m_Tasks[m_TaskIndex].m_DecodeSync = 0;
    m_Tasks[m_TaskIndex].m_pDecodeOutSurface = 0;

 

    if(m_DECODEFlag)
    {// feed decoder
        mfxSyncPoint        DecodeSyncPoint = 0;
        static mfxU16      nDecoderSurfIndex = 0; // index of free surface
        mfxStatus   sts = MFX_ERR_NONE;
        m_pmfxDecodeSurfaceLast = NULL; // reset curretn decoder surface to get new one from Decoder
        while(MFX_ERR_NONE <= sts || MFX_ERR_MORE_DATA == sts || MFX_ERR_MORE_SURFACE == sts || MFX_WRN_DEVICE_BUSY == sts)
        {// loop until decoder report that it get request for new frame
            if (MFX_WRN_DEVICE_BUSY == sts)
            {
                Sleep(1); // just wait and then repeat the same call to DecodeFrameAsync
            }
            else if (MFX_ERR_MORE_DATA == sts)
            { // read more data to input bit stream
                sts = m_FileReader.ReadNextFrame(&m_mfxBS);
                MSDK_BREAK_ON_ERROR(sts);
            }
            else if (MFX_ERR_MORE_SURFACE == sts || MFX_ERR_NONE == sts)
            {// find new working-output surface in m_pmfxDecodeSurfaces
                //nDecoderSurfIndex = 0;
                nDecoderSurfIndex = GetFreeSurfaceIndex(m_pmfxDecodeSurfaces, m_mfxDecoderResponse.NumFrameActual,nDecoderSurfIndex);
                if (MSDK_INVALID_SURF_IDX == nDecoderSurfIndex)
                {
                    return MFX_ERR_MEMORY_ALLOC;
                }
            }

            // send request to decoder
            sts = m_pmfxDEC->DecodeFrameAsync(
                &m_mfxBS,
                (mfxFrameSurface1*)&(m_pmfxDecodeSurfaces[nDecoderSurfIndex]),
                (mfxFrameSurface1**)&m_pmfxDecodeSurfaceLast,
                &DecodeSyncPoint);
            // ignore warnings if output is available,
            // if no output and no action required just repeat the same call
            if (MFX_ERR_NONE < sts && DecodeSyncPoint)
            {
                sts = MFX_ERR_NONE;
            }

            if (MFX_ERR_NONE == sts)
            {// decoder return sync point then fill the curretn task nad switch to OCL Plugin feeding
                m_Tasks[m_TaskIndex].m_DecodeSync = DecodeSyncPoint;
                m_Tasks[m_TaskIndex].m_pDecodeOutSurface = m_pmfxDecodeSurfaceLast;
                // look for output process
                if(m_Tasks[m_TaskIndex].m_pDecodeOutSurface)
                    _InterlockedIncrement16((short*)&m_Tasks[m_TaskIndex].m_pDecodeOutSurface->Data.Locked);
                break;
            }
        }
        if(MFX_ERR_NONE != sts)
        {
            printf("ERROR: Decoder returns error %d!\n",sts);
            stsOut = sts;
        }
    }//if(m_DECODEFlag)

    if(m_pOCLPlugin && m_pOCLPlugin->m_OCLFlag)
    {// OPENCL part
        mfxSyncPoint        OCLSyncPoint = 0;
        mfxStatus   sts = MFX_ERR_NONE;
        // get index for output surface for OCL plugin
        mfxU16 nOCLSurfIndex = GetFreeSurfaceIndex(m_pmfxOCLSurfaces, m_mfxOCLResponse.NumFrameActual);
        MSDK_CHECK_ERROR(nOCLSurfIndex, MSDK_INVALID_SURF_IDX, MFX_ERR_MEMORY_ALLOC);
        mfxHDL pOutSurf = &m_pmfxOCLSurfaces[nOCLSurfIndex];
        mfxHDL inp = m_pmfxDecodeSurfaceLast;

        // OCL filter
        for(;;)
        {
            sts = MFXVideoUSER_ProcessFrameAsync(m_mfxSession, &inp, 1, &pOutSurf, 1, &OCLSyncPoint);

            if (MFX_WRN_DEVICE_BUSY == sts)
            {
                Sleep(1); // just wait and then repeat the same call
            }
            else
            {
                break;
            }
        }

        // ignore warnings if output is available,
        if (MFX_ERR_NONE < sts && OCLSyncPoint)
        {
            sts = MFX_ERR_NONE;
        }

        if(MFX_ERR_NONE!=sts)
        {
            printf("ERROR: OpenCL filter return error %d!\n",sts);
            return sts;
        }

        {
            m_Tasks[m_TaskIndex].m_OCLSync = OCLSyncPoint;
            m_Tasks[m_TaskIndex].m_pOCLOutSurface = &m_pmfxOCLSurfaces[nOCLSurfIndex];
            // look for output process
            _InterlockedIncrement16((short*)&m_Tasks[m_TaskIndex].m_pOCLOutSurface->Data.Locked);
        }
    }

    // increase task index to point to next task.
    m_TaskIndex = (m_TaskIndex+1)%SYNC_BUF_SIZE;
    return stsOut;
}//CDecodingPipeline::DecodeOneFrame

What mistakes/misunderstandings I have by changing the code in this manner

0 Kudos
2 Replies
Surbhi_M_Intel
Employee
327 Views

Hi there, 

Sorry, it took us a while to get back to you. 
Regarding how the decode & openCL pipeline is working in parallel - Refer to the Media SDK manual Page 64, this illustrate how the asynchronous pipeline & syncoperation works. Also decode overview in development guide can be helpful to understand the framework.
If you haven't then please look at other tutorials to see understand async & syncoperation calll. 

I have noticed in your code that you added a parameter begin (in second code) when openCL syncoperation happens but using that param is being used in 1st code where it is not initialized.
1st code you provided - 
    if(m_pOCLPlugin && m_pOCLPlugin->m_OCLFlag && begin)
    {// OPENCL part
        mfxSyncPoint        OCLSyncPoint = 0;
        mfxStatus   sts = MFX_ERR_NONE;

2nd code you have provided -  if(m_pOCLPlugin && m_pOCLPlugin->m_OCLFlag)
    {// OPENCL part
        mfxSyncPoint        OCLSyncPoint = 0;
        mfxStatus   sts = MFX_ERR_NONE;

Is it by mistake? Also can you explain what was the o/p to your code when you asked "What mistakes/misunderstandings I have by changing the code in this manner"?

Thanks,
-Surbhi

 

0 Kudos
Surbhi_M_Intel
Employee
327 Views

Here is the explanation to how we are using User_ProcessFrameAsync for the asynchronous execution and synchronize the data using Decode Frame Async. This is a fundamental concept of MSDK asynchronous API: DecodeFrameAsync and User_ProcessFrameAsync only submit tasks to MSDK scheduler, no actual execution is done in these calls. Then scheduler executes tasks based on their dependency, so it is the scheduler who care about executing OCL task on the decode output surface only after decode task has finished. And that’s the benefit of wrapping the OCL code into MSDK plugin (USER component) – to get advantage of asynchronous execution and to have data synchronization covered by MSDK.

Hope it helps!

-Surbhi

0 Kudos
Reply