GPU Loads on Skylake and Haswell

Mr_Anderson · ‎11-09-2017

I'm using MSS 2017R3 on a Skylake CPU and MSS 2016 on a Haswell CPU with FFmpeg to transcode streams from MPEG2 to H264.

When I check the GPU load with the metrics_monitor tool they show different usage patterns. On Haswell, both Render usage and Video usage are equally loaded. However on Skylake, Render usage is more loaded than Video usage.

SKYLAKE:

RENDER usage: 90.91, VIDEO usage: 19.19, VIDEO_E usage: 0.00 GT Freq: 1100.00

RENDER usage: 85.86, VIDEO usage: 20.20, VIDEO_E usage: 0.00 GT Freq: 1100.00

RENDER usage: 78.00, VIDEO usage: 15.00, VIDEO_E usage: 0.00 GT Freq: 1100.00

RENDER usage: 79.17, VIDEO usage: 20.83, VIDEO_E usage: 0.00 GT Freq: 1100.00

RENDER usage: 80.61, VIDEO usage: 15.31, VIDEO_E usage: 0.00 GT Freq: 1100.00

RENDER usage: 75.76, VIDEO usage: 20.20, VIDEO_E usage: 0.00 GT Freq: 1100.00

HASWELL:

RENDER usage: 58.59, VIDEO usage: 56.57, VIDEO_E usage: 0.00

RENDER usage: 58.76, VIDEO usage: 55.67, VIDEO_E usage: 0.00

RENDER usage: 52.58, VIDEO usage: 50.52, VIDEO_E usage: 0.00

RENDER usage: 52.04, VIDEO usage: 57.14, VIDEO_E usage: 0.00

RENDER usage: 67.71, VIDEO usage: 65.62, VIDEO_E usage: 0.00

RENDER usage: 57.73, VIDEO usage: 57.73, VIDEO_E usage: 0.00

RENDER usage: 66.67, VIDEO usage: 65.62, VIDEO_E usage: 0.00

RENDER usage: 55.56, VIDEO usage: 59.60, VIDEO_E usage: 0.00

Is there any explanation for this?

Mark_L_Intel1 · ‎11-09-2017

Hi Mr. Anderson,

This depends on a lot of factors, to simplify the testing, could you try the same transcoding with our sample code "sample_multi_transcode"?

If you download the sample package from the following page:

https://software.intel.com/en-us/intel-media-server-studio-support/code-samples

You can find the built binary with the test streams, then you could do:

./sample_multi_transcode -i::mpeg2 <input file> -o::h264 <output file>

Mark

DMITRY_R_Intel4 · ‎11-15-2017

Change of the GPU usage pattern is expected between last release of MSS 2016 product line and MSS 2017 R1. The reason of the difference is introduction of i915 kernel mode driver scheduler which tracks inter-task dependencies and is capable to schedule tasks with resolved dependencies out of FIFO order (MSS 2016 worked in FIFO mode). This permits to significantly improve performance in case of big number of inter-dependent tasks (for example, non-FEI encoding operation submits 2 dependent tasks, one on RENDER, another on VIDEO code engine). As a side effect you observe change in the GPU usage pattern. On MSS 2016 engines were subject to be occupied by the task in the wait for dependencies resolution state. On MSS 2017 they are not and they really can execute something else improving parallelism.

So, behavior which you observe is normal, at least based on the currently provided description. Do you experience any associated performance issues comparing HSW with SKL?

Dmitry.

Mr_Anderson · ‎11-17-2017

Thanks for the replies.

Both servers are in production now. I'm not able to do tests on them. Thanks for the explanations.