Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ankush_W_
New Contributor I
108 Views

Decoding performance issue

Hi guys,

I am working on Hardware decoding, the decoder gets initialized fine with MFX_IMPL_HARDWARE2|MFX_IMPL_VIA_D3D9 and i am using vpp for the colourspace conversion NV12->RGB4. My system configuration is i7-2600k and I am using 2013 R2 SDK. Everything is working fine except for the CPU usage which is about 32% throughout the decoding process and i am decoding @25fps 1920*1080.

Here is how i am setting up the VPP structure

    m_VPPParams.vpp.In.FourCC = MFX_FOURCC_NV12;
    m_VPPParams.vpp.In.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
    m_VPPParams.vpp.In.CropX = 0;
    m_VPPParams.vpp.In.CropY = 0;
    m_VPPParams.vpp.In.CropW = 1920;
    m_VPPParams.vpp.In.CropH = 1080;

    m_VPPParams.vpp.In.PicStruct = MFX_PICSTRUCT_FIELD_TFF;

    m_VPPParams.vpp.In.FrameRateExtN = 240000;
    m_VPPParams.vpp.In.FrameRateExtD = 9600;

    // width must be a multiple of 16
    // height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture
    m_VPPParams.vpp.In.Width = MSDK_ALIGN16(m_VPPParams.vpp.In.CropW);
    m_VPPParams.vpp.In.Height =
        (MFX_PICSTRUCT_PROGRESSIVE == m_VPPParams.vpp.In.PicStruct) ?
        MSDK_ALIGN16(m_VPPParams.vpp.In.CropH) :
        MSDK_ALIGN32(m_VPPParams.vpp.In.CropH);
    // Output data
    m_VPPParams.vpp.Out.FourCC = MFX_FOURCC_RGB4;
    m_VPPParams.vpp.Out.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
    m_VPPParams.vpp.Out.CropX = 0;
    m_VPPParams.vpp.Out.CropY = 0;
    m_VPPParams.vpp.Out.CropW = m_VPPParams.vpp.In.CropW ;   
    m_VPPParams.vpp.Out.CropH = m_VPPParams.vpp.In.CropH ;

    m_VPPParams.vpp.Out.PicStruct = MFX_PICSTRUCT_FIELD_TFF;

    m_VPPParams.vpp.Out.FrameRateExtN = 240000;
    m_VPPParams.vpp.Out.FrameRateExtD = 9600;

    m_VPPParams.vpp.Out.Width = MSDK_ALIGN16(m_VPPParams.vpp.Out.CropW);
    m_VPPParams.vpp.Out.Height =(MFX_PICSTRUCT_PROGRESSIVE == m_VPPParams.vpp.Out.PicStruct) ?
                                    MSDK_ALIGN16(m_VPPParams.vpp.Out.CropH) :
                                    MSDK_ALIGN32(m_VPPParams.vpp.Out.CropH);

    m_VPPParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
    m_VPPParams.AsyncDepth = 1;

0 Kudos
5 Replies
Surbhi_M_Intel
Employee
108 Views

Hi Ankush,

You can use video memory surfaces instead of system memory surfaces, which avoids copying of images to system memory
VPPParams.IOPattern = MFX_IOPATTERN_IN_VIDEO_MEMORY | MFX_IOPATTERN_OUT_VIDEO_MEMORY;
This can save on CPU Usage. Let us know if that works. Also can you let us know the driver version of your system as well, if you still encounter the issue?

Thanks,
-Surbhi

Ankush_W_
New Contributor I
108 Views

Hi Surbhi, Thank you for replying. And yes I did try using the video memory, but some how the RunFrameVPPAsync() function keeps returning NV12 data instead of RGB4 while working with video memory. Another thing is that when I QueryIOSurf() for the VPP Request it returns MFX_WRN_PARTIAL_ACCELERATION irrespective of the memory I use, by which I understand that the conversion wont be carried out on the hardware. and if i try m_pmfxDEC->Query(&m_VPPParams, &m_VPPParams); it returns MFX_ERR_UNSUPPORTED. And I also want to know that if VPP actually supports a NV12 to RGB4 hardware based conversion. Regards Ankush
Surbhi_M_Intel
Employee
108 Views

Hi Ankush,

That's a strange behavior that RunFrameVPPAsync() function will return NV12 with video memory. The behavior should be same irrespective of memory used. To investigate more into this I need the params, the code and the input you are using. If you can replicate the behavior with existing tutorial simple_6_decode_vpp_postproc (which seems to be the match with your pipeline), that would help us to replicate the issue.
 
QueryIOSurf() for the VPP Request it returns MFX_WRN_PARTIAL_ACCELERATION irrespective of the memory I use
This issue shouldn't be dependent upon the memory used, it depend upon youtr HW capabilities that if it support HW or will fall back to SW. I tried to reproduce this issue on my system, but didn't encounter this warning. It would be a good to try the same code on another system and see if you see the same problem. Also if you can send your HW capabilities by using the Media SDK sys analyzer(details over
 here), probably i can dig in and find if there is any limitation to the system you are using. 

if i try
m_pmfxDEC->Query(&m_VPPParams, &m_VPPParams);
it returns MFX_ERR_UNSUPPORTED.
You are using VPP params  to query for the number of surfaces required for decode. Right params should be  mfxDEC.QueryIOSurf(&mfxVideoParams, &DecRequest); You can find more details again in this existing tutorial simple_6_decode_vpp_postproc (can be download from
 here).


-Surbhi

Ankush_W_
New Contributor I
108 Views

In my application, I am splitting MTS file using FFMpeg and then video stream is decoding using IQSV.

When I decode single MTS file, its decode properly but CPU utilization is goes upto 25%.

Also when I start second instance of IQSV decoder, all the init function return MFX_WRN_PARTIAL_ACCELERATION.

Please find my sample code ,my system analyzer. and trace log.

Please find my code

   CSmplBitstreamReader* m_FileReader;
  std::shared_ptr<FFMPEGReader> m_ffmpegFR;
  mfxU32                  m_nFrameIndex; // index of processed frame
  mfxBitstream            m_mfxBS; // contains encoded data
  MFXVideoSession     m_mfxSession;
  MFXVideoDECODE*     m_pmfxDEC;
  mfxFrameSurface1**       m_pmfxSurfaces; // frames array
  mfxFrameSurface1**        m_pVppSurfaces; // frames array for vpp input
  mfxFrameAllocResponse   m_mfxResponse;  // memory allocation response for decoder  
  mfxFrameAllocResponse    m_VppResponse;  // memory allocation response for vpp 
  mfxU8*                    m_surfaceBuffers;

    mfxVideoParam       m_mfxVideoParams; 
    mfxVideoParam       m_VPPParams; 

mfxStatus Init()
{
  
    MSDK_CHECK_POINTER(pParams, MFX_ERR_NULL_PTR);
    m_nAsyncDepth = 1;

   // =========== ffmpeg splitter ============

    MSDK_CHECK_POINTER(m_ffmpegFR, MFX_ERR_MEMORY_ALLOC);
    m_FileReader = dynamic_cast<CSmplBitstreamReader*>(m_ffmpegFR.get());
    
    sts = m_ffmpegFR->Init(pParams->strSrcFile, pParams->videoType);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    m_width = m_ffmpegFR->m_pFormatCtx->streams[m_ffmpegFR->m_videoStreamIdx]->codec->width;
    m_height = m_ffmpegFR->m_pFormatCtx->streams[m_ffmpegFR->m_videoStreamIdx]->codec->height; 
    // =========== ffmpeg splitter ============

    // API version   
    mfxVersion version;
    sts = DetermineMinimumRequiredVersion(*pParams, version);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    mfxIMPL impl = MFX_IMPL_HARDWARE_ANY;
    mfxVersion ver = {0, 1};

    sts = m_mfxSession.Init(impl, &ver);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    // Create Media SDK decoder
    m_pmfxDEC = new MFXVideoDECODE(m_mfxSession);
    // Create Media SDK VPP component
    m_mfxVPP = new MFXVideoVPP(m_mfxSession); 


    memset(&m_mfxVideoParams, 0, sizeof(m_mfxVideoParams));
    m_mfxVideoParams.mfx.CodecId = MFX_CODEC_AVC;
    m_mfxVideoParams.IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
    m_mfxVideoParams.AsyncDepth = m_nAsyncDepth;
    // Prepare Media SDK bit stream buffer
    // - Arbitrary buffer size for this example

    memset(&m_mfxBS, 0, sizeof(m_mfxBS));
    m_mfxBS.MaxLength = 1024 * 1024;
    m_mfxBS.Data = new mfxU8[m_mfxBS.MaxLength];
    MSDK_CHECK_POINTER(m_mfxBS.Data, MFX_ERR_MEMORY_ALLOC);

       // try to find a sequence header in the stream

    // if header is not found this function exits with error (e.g. if device was lost and there's no header in the remaining stream)
    for(;;)
    {
        // trying to find PicStruct information in AVI headers
        if ( m_mfxVideoParams.mfx.CodecId == MFX_CODEC_JPEG )
            MJPEG_AVI_ParsePicStruct(&m_mfxBS);

        // parse bit stream and fill mfx params
        sts = m_pmfxDEC->DecodeHeader(&m_mfxBS, &m_mfxVideoParams);

        if (MFX_ERR_MORE_DATA == sts)
        {
            if (m_mfxBS.MaxLength == m_mfxBS.DataLength)
            {
                sts = ExtendMfxBitstream(&m_mfxBS, m_mfxBS.MaxLength * 2); 
                MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);
            }
            // read a portion of data             
            sts = m_FileReader->ReadNextFrame(&m_mfxBS);
            MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

            continue;
        }
        else
        {
            // if input is interlaced JPEG stream
            if ( m_mfxBS.PicStruct == MFX_PICSTRUCT_FIELD_TFF || m_mfxBS.PicStruct == MFX_PICSTRUCT_FIELD_BFF)
            {
                m_mfxVideoParams.mfx.FrameInfo.CropH *= 2;
                m_mfxVideoParams.mfx.FrameInfo.Height = MSDK_ALIGN16(m_mfxVideoParams.mfx.FrameInfo.CropH);
                m_mfxVideoParams.mfx.FrameInfo.PicStruct = m_mfxBS.PicStruct;
            }

            break;
        }
    }
    
    MSDK_IGNORE_MFX_STS(sts, MFX_WRN_PARTIAL_ACCELERATION);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);


    // Initialize VPP parameters
  
    m_VPPParams.vpp.In.FourCC         = MFX_FOURCC_NV12;
    m_VPPParams.vpp.In.ChromaFormat   = MFX_CHROMAFORMAT_YUV420;  
    m_VPPParams.vpp.In.CropX          = 0;
    m_VPPParams.vpp.In.CropY          = 0; 
    m_VPPParams.vpp.In.CropW          = m_mfxVideoParams.mfx.FrameInfo.CropW;
    m_VPPParams.vpp.In.CropH          = m_mfxVideoParams.mfx.FrameInfo.CropH;
    m_VPPParams.vpp.In.PicStruct      = /*MFX_PICSTRUCT_FIELD_TFF*/MFX_PICSTRUCT_PROGRESSIVE;
    m_VPPParams.vpp.In.FrameRateExtN  = 25;
    m_VPPParams.vpp.In.FrameRateExtD  = 1;
    // width must be a multiple of 16 
    // height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture  
    m_VPPParams.vpp.In.Width  = MSDK_ALIGN16(m_VPPParams.vpp.In.CropW);
    m_VPPParams.vpp.In.Height = (MFX_PICSTRUCT_PROGRESSIVE == m_VPPParams.vpp.In.PicStruct)?
                                 MSDK_ALIGN16(m_VPPParams.vpp.In.CropH) : MSDK_ALIGN32(m_VPPParams.vpp.In.CropH);
    // Output data
    m_VPPParams.vpp.Out.FourCC        = MFX_FOURCC_RGB4/*MFX_FOURCC_NV12*/;     
    m_VPPParams.vpp.Out.ChromaFormat  = MFX_CHROMAFORMAT_YUV420;             
    m_VPPParams.vpp.Out.CropX         = 0;
    m_VPPParams.vpp.Out.CropY         = 0; 
    m_VPPParams.vpp.Out.CropW         = m_VPPParams.vpp.In.CropW/*/2*/;  // Resize to half size resolution
    m_VPPParams.vpp.Out.CropH         = m_VPPParams.vpp.In.CropH/*/2*/;
    m_VPPParams.vpp.Out.PicStruct     = /*MFX_PICSTRUCT_FIELD_TFF*/MFX_PICSTRUCT_PROGRESSIVE;
    m_VPPParams.vpp.Out.FrameRateExtN = 25;
    m_VPPParams.vpp.Out.FrameRateExtD = 1;
    // width must be a multiple of 16 
    // height must be a multiple of 16 in case of frame picture and a multiple of 32 in case of field picture  
    m_VPPParams.vpp.Out.Width  = MSDK_ALIGN16(m_VPPParams.vpp.Out.CropW); 
    m_VPPParams.vpp.Out.Height = (MFX_PICSTRUCT_PROGRESSIVE == m_VPPParams.vpp.Out.PicStruct)?
                                    MSDK_ALIGN16(m_VPPParams.vpp.Out.CropH) : MSDK_ALIGN32(m_VPPParams.vpp.Out.CropH);

    m_VPPParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
    m_VPPParams.AsyncDepth = m_nAsyncDepth;    
    // Query number of required surfaces for decoder
    mfxFrameAllocRequest DecRequest;
    memset(&DecRequest, 0, sizeof(DecRequest));
    sts = m_pmfxDEC->QueryIOSurf(&m_mfxVideoParams, &DecRequest);
    MSDK_IGNORE_MFX_STS(sts, MFX_WRN_PARTIAL_ACCELERATION);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    // Query number of required surfaces for VPP
    mfxFrameAllocRequest VPPRequest[2];// [0] - in, [1] - out
    memset(&VPPRequest, 0, sizeof(mfxFrameAllocRequest)*2);
    sts = m_mfxVPP->QueryIOSurf(&m_VPPParams, VPPRequest);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);       


    // Determine the required number of surfaces for decoder output (VPP input) and for VPP output 
    nSurfNumDecVPP = DecRequest.NumFrameSuggested + VPPRequest[0].NumFrameSuggested;
     nSurfNumVPPOut = VPPRequest[1].NumFrameSuggested;


    // Allocate surfaces for decoder and VPP In
    // - Width and height of buffer must be aligned, a multiple of 32 
    // - Frame surface array keeps pointers all surface planes and general frame info
    mfxU16 width = (mfxU16)MSDK_ALIGN32(DecRequest.Info.Width);
    mfxU16 height = (mfxU16)MSDK_ALIGN32(DecRequest.Info.Height);
    mfxU8  bitsPerPixel = 12;  // NV12 format is a 12 bits per pixel format
    mfxU32 surfaceSize = width * height * bitsPerPixel / 8;
     m_surfaceBuffers = (mfxU8 *)new mfxU8[surfaceSize * nSurfNumDecVPP];
    
    m_pmfxSurfaces = new mfxFrameSurface1*[nSurfNumDecVPP];
    MSDK_CHECK_POINTER(m_pmfxSurfaces, MFX_ERR_MEMORY_ALLOC);       
    for (int i = 0; i < nSurfNumDecVPP; i++)
    {       
        m_pmfxSurfaces = new mfxFrameSurface1;
        memset(m_pmfxSurfaces, 0, sizeof(mfxFrameSurface1));
        memcpy(&(m_pmfxSurfaces->Info), &(m_mfxVideoParams.mfx.FrameInfo), sizeof(mfxFrameInfo));
        m_pmfxSurfaces->Data.Y = &m_surfaceBuffers[surfaceSize * i];
        m_pmfxSurfaces->Data.U = m_pmfxSurfaces->Data.Y + width * height;
        m_pmfxSurfaces->Data.V = m_pmfxSurfaces->Data.U + 1;
        m_pmfxSurfaces->Data.Pitch = width;
    }  

    // Allocate surfaces for VPP Out
    // - Width and height of buffer must be aligned, a multiple of 32 
    // - Frame surface array keeps pointers all surface planes and general frame info
    width = (mfxU16)MSDK_ALIGN32(VPPRequest[1].Info.Width);
    height = (mfxU16)MSDK_ALIGN32(VPPRequest[1].Info.Height);
    bitsPerPixel = 32;  // NV12 format is a 12 bits per pixel format
    surfaceSize = width * height * bitsPerPixel / 8;
     m_surfaceBuffers2 = (mfxU8 *)new mfxU8[surfaceSize * nSurfNumVPPOut];
    
  m_pVppSurfaces = new mfxFrameSurface1*[nSurfNumVPPOut];
    MSDK_CHECK_POINTER(m_pVppSurfaces, MFX_ERR_MEMORY_ALLOC);       
    for (int i = 0; i < nSurfNumVPPOut; i++)
    {       
        m_pVppSurfaces = new mfxFrameSurface1;
        memset(m_pVppSurfaces, 0, sizeof(mfxFrameSurface1));
        memcpy(&(m_pVppSurfaces->Info), &(m_VPPParams.vpp.Out), sizeof(mfxFrameInfo));
        m_pVppSurfaces->Data.B = &m_surfaceBuffers2[surfaceSize * i];

        m_pVppSurfaces->Data.G = m_pVppSurfaces->Data.B + 1;
        m_pVppSurfaces->Data.R = m_pVppSurfaces->Data.B + 2;
        m_pVppSurfaces->Data.A = m_pVppSurfaces->Data.B + 3;
        m_pVppSurfaces->Data.Pitch = width * 4;
    }  

    // Initialize the Media SDK decoder
    sts = m_pmfxDEC->Init(&m_mfxVideoParams);
    MSDK_IGNORE_MFX_STS(sts, MFX_WRN_PARTIAL_ACCELERATION);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

    // Initialize Media SDK VPP
    sts = m_mfxVPP->Init(&m_VPPParams);
    MSDK_IGNORE_MFX_STS(sts, MFX_WRN_PARTIAL_ACCELERATION);
    MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, sts);

}

Reards

Ankush

Surbhi_M_Intel
Employee
108 Views

Hi Ankush,

One thing to consider in your code is to use video memory instead of system so that there is no extra copy involved which will avoid CPU utilization.
m_mfxVideoParams.IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
Same goes for the VPP IO Pattern as well
m_VPPParams.IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;

Another imp thing from sys analyzer is to update the driver to the latest version,there has been lot of fixes with the latest drivers. So it is best to keep the system updated. You can find the check and download the driver from here - downloadcenter.intel.com I am hoping partial acceleration warning would not be seen after that. 
Also, It would be of best interest if you use Media SDK 2014 R2 Release. In the past, the issues have been fixed by upgrading to latest driver and the Media SDK release.

If the problem still exists, then please send us the complete code in a file with the input and the directions how to run it. 

Thanks,
-Surbhi

Reply