Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ramya_s_
Beginner
124 Views

Difference in behaviour of Intel MSS hw encoder on skylake and haswell

.hey, 

I am using the following command line to run the hw encoder on Skylake i7 6700k Windows 10, intel media sdk proffessional version 

sample_encode.exe h265 -hw -i <input file> -w 1920 -1080 -f 30 -o <outpufile.hevc> -u speed -b 1000 -async 3 -p 6fadc791a0c2eb479ab6dc5ea9da347

The encoder is run for the same video at bitrates ranging from 1000 kbps to say 10000 kbps. The encode time is measured using the timer constructs and example give in sample_encode application,

Ctimer timer;

timer.start()

pipeline->run()

Elapsed time = timer.Gettime() 

The fps of the encode is calculated using No_of_frames/ encode time. While all this works fine, I observed that the fps increases as input bitrate increases. This is very counter-intuitive considering that as the bit rate increases quality of the encode increases. Run to run variations were eliminated by encoding multiple times for the same video and same bitrate.

Importantly, there was expected drop of fps as bitrate increases for the same video and similar commandline on Haswell i5 4440. Do shed some light on why this is happening.  

 

0 Kudos
11 Replies
124 Views

Hi Ramya,

The main reason for observing difference b/w skylake and haswell proccessors for HEVC encode is due to different implementation support availability on platforms. Skylake is new 6th generation processor which supports HEVC HW acceleration for encode. Haswell is a 4th generation processor where HW acceleration is a hybrid solution for HEVC encode and here a hybrid solution mean at a specific time workloads are running on GPU unit or CPU unit.  

" I observed that the fps increases as input bitrate increases. This is very counter-intuitive considering that as the bit rate increases quality of the encode increases. " -> Can you please share more details on your observation along with details including input clip used, driver version and is this on Skylake or haswell (or both)? In general say, you are using 1024 KBps (kilobytes per second), then for 60 fps it gives lower quality than at 24 fps, because there's fewer frames in 24fps over which this 1024 is distributed, that is 1024/24 = 42.6 KB per frame. With 60 fps, it is only 17.06 kilobytes per frame. 

Thanks,

Surbhi_M_Intel
Employee
124 Views

Hi Ramya, 

Thanks for raising the concern, do you have fps values against the bitrate increase collected on SKL ?
Please note that on HSW, HEVC is using sw or hybrid implementation where as on SKL, HEVC is hardware accelerated, so it might not be correct to compare both but the general understanding is correct that with lower bitrate, encoded fps will be faster as compared to higher bitrate. If you can provide some results you have collected on SKL will be helpful to debug further. 

Thanks,
Surbhi

 

 

Pradeep_R_
Beginner
124 Views

Hi,

My understanding of the HEVC GAcc encoder from the Server Studio was that it is a SW encoder where some parts are hardware accelerated. So in both HSW and SKL, the same parts are HW accelerated. Is that correct? Or are different parts accelerated based on which HW we run the encoder on?

Of course, given that the SKL GPU has more flops than the HSW GPU, the acceleration may be better on SKL than on HSW.

Thanks,

Pradeep.

ramya_s_
Beginner
124 Views

Thanks surbhi and harsh,

@Harsh- Fps is observed to increase for bitrate only in skylake and not in haswell for the hw encoder. That's the crux of the issue here i suppose.

@surbhi - The increase in fps for increasing bitrate on skylake is observed for a range of videos. I will provide data for a particular video called Apple tree.yuv (1920x1080, approx 1gb file) . The command line is mentioned in my previous post. 

https://drive.google.com/a/multicorewareinc.com/file/d/0BzmXxStP8S_JVWg2cFRveEhCRU0/view?usp=sharing -- link to the video!

Achieved Bitrate Average FPS for 10 iterations
914.7401183 107.2488716
1872.108639 112.027561
2752.934201 115.9276433
3811.953373 125.0394553
4983.904615 128.5753722
6008.022249 145.0404309
7041.873373 145.2851891
8040.933728 150.7325236
9092.351006 160.0731806
10084.31574 161.3264041
11023.66793 165.3117648
12045.07882 167.5525849
13085.86935 170.5256126
Jeffrey_M_Intel1
Employee
124 Views

 

As Harsh and Surbhi mentioned, the SW and GPU accelerated plugins are very different implementations from the fully HW based HEVC in SKL.  There are a few other variables to consider as well.

  1. Our decode/encode-only samples and tutorials aren't currently ideal for benchmarking.  Times are distorted by a lot of single byte serial file I/O for raw frames.  A better start for benchmarking is sample_multi_transcode or the transcode tutorials, which can test encode without distortions from that overhead.
  2. Bitrate control for HEVC is still developing.  To get a better understanding of core codec performance you may want to start with CQP mode. (A simple way to do this with sample_multi_transcode is below.)

I've been able to reproduce the behavior you're seeing with crowdrun (from https://media.xiph.org/video/derf/) -- and I believe your statement that this behavior can be seen across many inputs.  In the bitrate range you're testing, CBR and VBR must work increasingly harder as bitrate decreases, dropping the FPS.  For higher bitrates BRC becomes easy so the BRC overhead drops.

If you test with CQP you should see results that more closely match your expectations.

  • For SKL HW HEVC, you should see a relatively minor increase in FPS (~10%) as quality decreases from high (QP16) to low (QP44).  
  • For the SW and Gpu Accelerated plugins, FPS increases dramatically (can be 2x or more) as QP increases/quality decreases

BRC is an important part of overall codec performance.  Please watch for more updates on this topic in this forum and in other documentation.  

Regards, Jeff

 

 

A quick way to add CQP to sample_multi_transcode:

Change a few lines in  CTranscodingPipeline::InitEncMfxParams (in pipeline_transcode.cpp)

from this

  if (pInParams->nBitRate == 0)
    {
        pInParams->nBitRate = CalculateDefaultBitrate(pInParams->EncodeId,
            pInParams->nTargetUsage, m_mfxEncParams.mfx.FrameInfo.Width, m_mfxEncParams.mfx.FrameInfo.Height,
            1.0 * m_mfxEncParams.mfx.FrameInfo.FrameRateExtN / m_mfxEncParams.mfx.FrameInfo.FrameRateExtD);
    }

to this

    if (pInParams->nBitRate == 0)
    {
        // no bitrate selected, generate a default
        pInParams->nBitRate = CalculateDefaultBitrate(pInParams->EncodeId,
            pInParams->nTargetUsage, m_mfxEncParams.mfx.FrameInfo.Width, m_mfxEncParams.mfx.FrameInfo.Height,
            1.0 * m_mfxEncParams.mfx.FrameInfo.FrameRateExtN / m_mfxEncParams.mfx.FrameInfo.FrameRateExtD);
        m_mfxEncParams.mfx.TargetKbps = (mfxU16)(pInParams->nBitRate); // in Kbps

    } else if (pInParams->nBitRate <= 51) 
    {
        //Bitrate lower than usual range, assume TargetKbps specifies constant QP instead of bitrate
        m_mfxEncParams.mfx.RateControlMethod       = MFX_RATECONTROL_CQP;
        m_mfxEncParams.mfx.QPI=(mfxU16)pInParams->nBitRate;
        m_mfxEncParams.mfx.QPP=m_mfxEncParams.mfx.QPI+1;
        m_mfxEncParams.mfx.QPB=m_mfxEncParams.mfx.QPI+2;
    } else 
    {
        // use the specified bitrate in CBR or LA (VBR) mode as selected by other params
        m_mfxEncParams.mfx.TargetKbps = (mfxU16)(pInParams->nBitRate); // in Kbps
    }

    switch (m_mfxEncParams.mfx.RateControlMethod) {
    case MFX_RATECONTROL_CQP: printf("using CQP BRC. QPI=%u, QPP=%u, QPB=%u\n", m_mfxEncParams.mfx.QPI,m_mfxEncParams.mfx.QPP,m_mfxEncParams.mfx.QPB); break;
    case MFX_RATECONTROL_CBR: printf("using CBR BRC. targetKbps=%u\n",m_mfxEncParams.mfx.TargetKbps); break;
    case MFX_RATECONTROL_LA:  printf("using LA(VBR) BRC. targetKbps=%u LADepth=%u\n",m_mfxEncParams.mfx.TargetKbps,pInParams->nLADepth); break; 
    case MFX_RATECONTROL_LA_EXT: printf("using LA_EXT BRC (multi pipeline)\n"); break;
    default:
        puts("BRC mode: other");
    }

  

Jeffrey_M_Intel1
Employee
124 Views

Just wanted to add: checking with the team on what the options are for SKL HEVC HW's CBR performance.  Will update here with more info next week.

Jeffrey_M_Intel1
Employee
124 Views

Status update: escalated as a bug.

Pradeep_R_
Beginner
124 Views

Hi,

Any updates here? Is there a root-cause for this bug and a fix that we can try out?

 

Pradeep.

Jeffrey_M_Intel1
Employee
124 Views

A fix is in the Beta 15.40.18.4380 driver now publicly available from downloadcenter.intel.com.  As validated on my system there is no longer  significant slowdown for lower bitrates and CQP performance is closer to other BRC modes.  Please let us know if any related issues remain.

Regards, Jeff

 

 

ramya_s_
Beginner
124 Views

Thanks jeff,

The driver update solves the problem. Does the Intel hardware accelarated encoder present in intel media sdk proffessional version trial version support encoding 4k videos ? The resolution is 4096x2048. 

Jeffrey_M_Intel1
Employee
124 Views

Maximum resolutions are in the release notes. https://software.intel.com/sites/default/files/managed/de/2e/media_server_studio_sdk_release_notes_l...

For H264, max resolution is 4096x2304.

 

Reply