Solved: Why hardware encoder is slower with video memory than with system memory?

wenquan__po · ‎01-09-2020

Hi！

I run sample_encode using system memory with these options:

sample_encode.exe h264 -i D:\crowd_run_2160p50.yuv -o D:\crowd_run_2160p50.h264 -w 3840 -h 2160 -r 1 -async 1 -gpb:off -g 600 -x 1 -b 4000 -cbr -hw

The result is:

Frame number: 500
Encoding fps: 17

Encode one frame latency: 36.57ms

And then I run sample_encode using video memory with these options:

sample_encode.exe h264 -i D:\crowd_run_2160p50.yuv -o D:\crowd_run_2160p50.h264 -w 3840 -h 2160 -r 1 -async 1 -gpb:off -g 600 -x 1 -b 4000 -cbr -d3d11 -hw

The result is:

Frame number: 500
Encoding fps: 8

Encode one frame latency: 65.25 ms

As docs say，using video memory with hardware encoder can gain best performance，but the results listed above dosen't match what the docs say.

My computer's info:

Windows 10 1903

i7-7700CPU @3.60GHz 3.60GHz.

Could anyone explain this for me?

Thanks!

Dmitry_E_Intel · ‎01-13-2020

Hi!

Let me explain. Underlying HW always uses video memory, it simply does't know the system memory. So conversion system->video memory always present in the pipeline somewhere:

- when you run sample_encode with video memory you on sample side call Lock/Map (of D3D9/D3D11 interfaces) input surface, write data to it (e.g. from YUV file on a disk), call Unlock/Unmap. system->video copy/conversion happens here.

- when you run sample_encode with system memory, you fill surface in system memory and pass it MediaSDK encoder. Then the MSDK internally makes system->video copy and it may using internal optimizations (like GPUCopy). Plus please pay attentions that you read pretty big YUV from disk. Using system memory, you indirectly introduce some asynchronous processing here: while you load current frame from YUV file to system memory, MSDK can do system->video copy for previous frame. This also can give some performance benefits.

BTW, you can double check impact of YUV file reading by "-perf_opt <value> -n <N>" options. They will preload a number (<value>) of frames from YUV file to video memory, and encode them N times. I'm not sure if these options already were in last Windows MediaSDK release, but they are available in sample_encoder from GitHub: https://github.com/Intel-Media-SDK/MediaSDK/tree/master/samples/sample_encode

View solution in original post

Dmitry_E_Intel · ‎01-13-2020

Hi!

Let me explain. Underlying HW always uses video memory, it simply does't know the system memory. So conversion system->video memory always present in the pipeline somewhere:

- when you run sample_encode with video memory you on sample side call Lock/Map (of D3D9/D3D11 interfaces) input surface, write data to it (e.g. from YUV file on a disk), call Unlock/Unmap. system->video copy/conversion happens here.

- when you run sample_encode with system memory, you fill surface in system memory and pass it MediaSDK encoder. Then the MSDK internally makes system->video copy and it may using internal optimizations (like GPUCopy). Plus please pay attentions that you read pretty big YUV from disk. Using system memory, you indirectly introduce some asynchronous processing here: while you load current frame from YUV file to system memory, MSDK can do system->video copy for previous frame. This also can give some performance benefits.

BTW, you can double check impact of YUV file reading by "-perf_opt <value> -n <N>" options. They will preload a number (<value>) of frames from YUV file to video memory, and encode them N times. I'm not sure if these options already were in last Windows MediaSDK release, but they are available in sample_encoder from GitHub: https://github.com/Intel-Media-SDK/MediaSDK/tree/master/samples/sample_encode