- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Question about per LCU QP control. I was wondering what kind of performance should I expect from the per-LCU QP control on HEVC?
I tried comparing H264 with and without EnableMBQP and I got the following throughput in 1280x720: 339 fps without EnableMBQP and 595 fps with EnableMBQP. We observe here a gain in throughput by activating EnableMBQP.
On the other hand, in H265, we have: 180 fps without EnableMBQP and it drops to 71 fps when activating EnableMBQP.
In short, is it normal that we are observing a significant performance drop in HEVC when activating EnableMBQP while in H264 we gain throughput?
Environment information
OS: Ubuntu 20.04
CPU: Intel Core i9-9900K @ 3.60 GHz (Coffee Lake)
SDK and drivers:
- Intel Media SDK Version 20.2.1 (also tested on 20.4.pre)
- Media Driver 20.2.0
- Gmmlib 20.2.2
- libva 2.8.0
- libva-utils 2.8.0
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that the following functions take a significant amount of time to execute during encoding when we enable "EnableMBQP" (~ x10 longer execution time):
DDI_VA::SubmitTask
TaskManager::SubmitTask
VAPacker::SubmitTask
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Trans,
This is very important and I want to report to dev team.
Do you have the measurement number and the application you used?
I also want to reproduce it if you can give me the command line.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Mark,
We tried with two different codes: 1) modified sample_encode and 2) sample_multi_transcode
1) Modified sample_encode
We took the function which configure the parameters and fills the MB/CU QP value from the sample_multi_transcode (CTranscodingPipeline::SetEncCtrlRT and CTranscodingPipeline::FillMBQPBuffer) and imported it to sample_encode. Then, we run the following commands:
HD HEVC encoding:
sample_encode h265 -hw -i input.yuv -o output.265 -f 60 -h 1920 -w 1080 -cqp -mbqp
Results:
libva info: VA-API version 1.8.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64//iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_8
libva info: va_openDriver() returns 0
Encoding Sample Version 20.2.0
Input file format YUV420
Output video HEVC
Source picture:
Resolution 1088x1920
Crop X,Y,W,H 0,0,1080,1920
Destination picture:
Resolution 1088x1920
Crop X,Y,W,H 0,0,1080,1920
Frame rate 60.00
QPI 26
QPP 28
QPB 30
Gop size 65535
Ref dist 8
Ref number 4
Idr Interval 0
Target usage balanced
Memory type system
Media SDK impl hw
Media SDK version 1.33
Processing started
Frame number: 207
Encoding fps: 47
H264 HD encoding:
sample_encode h264 -hw -i input.yuv -o output.264 -f 60 -h 1920 -w 1080 -cqp -mbqp
We obtained:
libva info: VA-API version 1.8.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64//iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_8
libva info: va_openDriver() returns 0
Encoding Sample Version 20.2.0
Input file format YUV420
Output video AVC
Source picture:
Resolution 1088x1920
Crop X,Y,W,H 0,0,1080,1920
Destination picture:
Resolution 1088x1920
Crop X,Y,W,H 0,0,1080,1920
Frame rate 60.00
QPI 0
QPP 0
QPB 0
Gop size 256
Ref dist 4
Ref number 3
Idr Interval 0
Target usage balanced
Memory type system
Media SDK impl hw
Media SDK version 1.33
Processing started
Frame number: 207
Encoding fps: 319
Processing finished
In short, for a HD video, we have a throughput of 319 fps when encoding with H264 and it drops to 47 fps when using HEVC.
In comparison, we've add the frame QP control in the sample code and the H265 throughput is at about 115 fps.
Another strange observation happens when we run multiple encode simultaneously:
A single stream has reduced total throughput (at 47 fps). However, when running 2 or more parallel HEVC encoding process, the total throughput is about 120 fps.
So we are suspecting an issue in the LCU QP control when running a single stream.
2) sample_multi_transcode
We first ran a transcode from H264 to H265 with:
sample_multi_transcode -i::h264 test_1080.264 -o::h265 test_1080_trans.265 -hw -cqp -extmbqp
Results:
Multi Transcoding Sample Version 20.2.0
libva info: VA-API version 1.8.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64//iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_8
libva info: va_openDriver() returns 0
Session 0:
[WARNING] Configuration changed on the Query() call
mfx.LowPower:0 changed to mfx.LowPower:32
ext.265P.PicHeightInLumaSamples:1080 changed to ext.265P.PicHeightInLumaSamples:1088
Pipeline surfaces number (DecPool): 12
MFX HARDWARE Session 0 API ver 1.33 parameters:
Input video: AVC
Output video: HEVC
Session 0 was NOT joined with other sessions
Transcoding started
..
Transcoding finished
Common transcoding time is 6.19848 sec
-------------------------------------------------------------------------------
*** session 0 [0x555555705cc8] PASSED (MFX_ERR_NONE) 6.19812 sec, 215 frames, 34.688 fps
-i::h264 test_1080.264 -o::h265 test_1080_trans.265 -hw -cqp -extmbqp
-------------------------------------------------------------------------------
The test PASSED
Then with transcode from H265 to H264
sample_multi_transcode -i::h265 test_1080.265 -o::h264 test_1080_trans.264 -hw -cqp -extmbqp
We obtained the following:
Multi Transcoding Sample Version 20.2.0
libva info: VA-API version 1.8.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /opt/intel/mediasdk/lib64//iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_8
libva info: va_openDriver() returns 0
Session 0:
Pipeline surfaces number (DecPool): 10
MFX HARDWARE Session 0 API ver 1.33 parameters:
Input video: HEVC
Output video: AVC
Session 0 was NOT joined with other sessions
Transcoding started
..
Transcoding finished
Common transcoding time is 1.67442 sec
-------------------------------------------------------------------------------
*** session 0 [0x555555705cc8] PASSED (MFX_ERR_NONE) 1.6741 sec, 211 frames, 126.038 fps
-i::h265 test_1080.265 -o::h264 test_1080_trans.264 -hw -cqp -extmbqp
-------------------------------------------------------------------------------
The test PASSED
We observe here a drop in performance when transcoding to H265. We have 126 fps and 35 fps for H264 and H265 respectively.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tran,
Sorry for the late response and thanks for the detailed report.
I have report this issue to dev team and I will updated you with the investigation result.
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tran,
I am working with developer for your request. They are using all sample_multi_transcode(SMT) to test the performance. Could you also do it? Since your original test modified sample_encode, this tests are more comparable to their test.
./sample_multi_transcode -i::h265 test.h265 -o::h264 out.h264 -cqp
./sample_multi_transcode -i::h265 test.h265 -o::h264 out.h264 -cqp -extmbqp
./sample_multi_transcode -i::h265 test.h265 -o::h265 out.h265 -cqp
./sample_multi_transcode -i::h265 test.h265 -o::h265 out.h265 -cqp -extmbqp
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tran,
I got a conclusion from developer and hope this would solve your question:
It turned out that the problem is that AVC sets the QP parameters to 0 by default, this provides a longer encoding than if we use "extmbqp" parameter, which sets a higher QP, and we get a higher encoding rate.
For HEVC default QP is 26.
Expected performance using "extmbqp" parameter depends on which QP values the MBQP sets.
If it sets higher QP values than we set ourselves (qpi, qpp, qpb) or was set by default, it will lead to better performance, if lower, then to a decrease in performance.
If we set the same QP parameters (qpi, qpp, qpb) for AVC and HEVC when encoding, then we align the same performance difference.
- For AVC:
sample_multi_transcode -i::h265 test.h265 -o::h264 out.h264 -cqp -qpi 26 -qpp 26 -qpb 26
sample_multi_transcode -i::h265 test.h265 -o::h264 out.h264 -cqp -extmbqp
- For HEVC:
sample_multi_transcode -i::h265 test.h265 -o::h265 out.h265 -cqp -qpi 26 -qpp 26 -qpb 26
sample_multi_transcode -i::h265 test.h265 -o::h265 out.h265 -cqp -extmbqp
And let me know if you can confirm this
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tran,
Do you have more questions?
If not, I will closed the request to the dev team.
Mark
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page