Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.
3076 Discussions

Quick Sync is slow than software encoder in broadcast encoding

cglee
Beginner
3,488 Views

Hi,

First, I am sorry for my poor English.

I am testing video conferencing by ffmpeg. I believed that HW encoder must be faster than SW encoder. So, I try to use qsv to reduce processing time in encoding h264.

I tested 2 cases.

First test is encoding raw_video data from file system to h264 in file.

- RawVideo(yuv422p) -> H264

- Total 500 frames

- SW encoder(libx264) takes 2.483 sec for encoding all frames.

- HW encoder(h264_qsv) takes 0.682 sec.

The result is very good as I expected. HW encoder is mush better(4x).

Following is options for ffmepg I used.

SW encoder : time ~/ffmpeg_build/bin/ffmpeg -loglevel verbose -pix_fmt yuyv422 -video_size 640x480 -f rawvideo -i ./640x480dump.raw -f avi -c:v h264 ./640x480dump.avi

HW encoder : time ~/ffmpeg_build/bin/ffmpeg -loglevel verbose -pix_fmt yuyv422 -video_size 640x480 -f rawvideo -i ./640x480dump.raw -f avi -c:v h264_qsv ./640x480dump.avi

 

Second test is streaming webcam to udp packet.

- SW encoder(libx264) takes 210ms in latency from webcam to displaying it on monitor of receiver.

- HW encoder(h264_qsv) takes 260ms in latency.

In this case, HW encoder is slow than SW encoder.  I wonder if Quick Sync HW encoder is not suitable for video conferencing or something which needs low latency. following is options I used and log text.

SW encoder(libx264) : ffmpeg -loglevel verbose -input_format yuv422p -video_size 640x480 -framerate 30 -f v4l2 -i /dev/video2 -preset ultrafast -tune zerolatency -f h264 -c:v libx264 udp://127.0.0.1:20001

HW encoder : ffmpeg -loglevel verbose -input_format yuv422p -video_size 640x480 -framerate 30 -f v4l2 -i /dev/video2 -preset veryfast -scenario videoconference -async_depth 1 -int_ref_cycle_dist 1 -f h264 -c:v h264_qsv udp://127.0.0.1:20001

 

Following is log text for HW encoder 

-------------------------------------------------------------------------------------------------

ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/home/highvolt/ffmpeg_build pkg-config-flags=--static --extra-libs='-lpthread -lm' --ld=g++ --enable-gpl --enable-gnutls --enable-libfreetype --enable-libx264 --enable-libvpl --enable-nonfree
libavutil 58. 2.100 / 58. 2.100
libavcodec 60. 3.100 / 60. 3.100
libavformat 60. 3.100 / 60. 3.100
libavdevice 60. 1.100 / 60. 1.100
libavfilter 9. 3.100 / 9. 3.100
libswscale 7. 1.100 / 7. 1.100
libswresample 4. 10.100 / 4. 10.100
libpostproc 57. 1.100 / 57. 1.100
[video4linux2,v4l2 @ 0x5604a3b50100] fd:3 capabilities:84a00001
Input #0, video4linux2,v4l2, from '/dev/video2':
Duration: N/A, start: 42298.683949, bitrate: 147456 kb/s
Stream #0:0: Video: rawvideo, 1 reference frame (YUY2 / 0x32595559), yuyv422, 640x480, 147456 kb/s, 30 fps, 30 tbr, 1000k tbn
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> h264 (h264_qsv))
Press [q] to stop, [?] for help
[graph 0 input from stream 0:0 @ 0x5604a3b55bc0] w:640 h:480 pixfmt:yuyv422 tb:1/1000000 fr:30/1 sar:0/1
[auto_scale_0 @ 0x5604a3b6ca80] w:iw h:ih flags:'' interl:0
[format @ 0x5604a3b6a7c0] auto-inserting filter 'auto_scale_0' between the filter 'Parsed_null_0' and the filter 'format'
[auto_scale_0 @ 0x5604a3b6ca80] w:640 h:480 fmt:yuyv422 sar:0/1 -> w:640 h:480 fmt:nv12 sar:0/1 flags:0x00000004
[h264_qsv @ 0x5604a3b54940] Encoder: input is system memory surface
[h264_qsv @ 0x5604a3b54940] Use Intel(R) oneVPL to create MFX session, the required implementation version is 1.1
[AVHWDeviceContext @ 0x5604a3d8bc00] Trying to use DRM render node for device 0.
[AVHWDeviceContext @ 0x5604a3d8bc00] libva: VA-API version 1.18.0
[AVHWDeviceContext @ 0x5604a3d8bc00] libva: User requested driver 'iHD'
[AVHWDeviceContext @ 0x5604a3d8bc00] libva: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
[AVHWDeviceContext @ 0x5604a3d8bc00] libva: Found init function __vaDriverInit_1_18
[AVHWDeviceContext @ 0x5604a3d8bc00] libva: va_openDriver() returns 0
[AVHWDeviceContext @ 0x5604a3d8bc00] Initialised VAAPI connection: version 1.18
[AVHWDeviceContext @ 0x5604a3d8bc00] VAAPI driver: Intel iHD driver for Intel(R) Gen Graphics - 23.1.6 ().
[AVHWDeviceContext @ 0x5604a3d8bc00] Driver not found in known nonstandard list, using standard behaviour.
[h264_qsv @ 0x5604a3b54940] Initialized an internal MFX session using hardware accelerated implementation
[h264_qsv @ 0x5604a3b54940] Using the variable bitrate (VBR) ratecontrol method
[h264_qsv @ 0x5604a3b54940] profile: avc high; level: 30
[h264_qsv @ 0x5604a3b54940] GopPicSize: 256; GopRefDist: 3; GopOptFlag: closed; IdrInterval: 0
[h264_qsv @ 0x5604a3b54940] TargetUsage: 7; RateControlMethod: VBR
[h264_qsv @ 0x5604a3b54940] BufferSizeInKB: 375; InitialDelayInKB: 187; TargetKbps: 1000; MaxKbps: 1500; BRCParamMultiplier: 1
[h264_qsv @ 0x5604a3b54940] NumSlice: 1; NumRefFrame: 2
[h264_qsv @ 0x5604a3b54940] RateDistortionOpt: OFF
[h264_qsv @ 0x5604a3b54940] RecoveryPointSEI: OFF
[h264_qsv @ 0x5604a3b54940] VDENC: OFF
[h264_qsv @ 0x5604a3b54940] Entropy coding: CABAC; MaxDecFrameBuffering: 2
[h264_qsv @ 0x5604a3b54940] NalHrdConformance: ON; SingleSeiNalUnit: ON; VuiVclHrdParameters: OFF VuiNalHrdParameters: ON
[h264_qsv @ 0x5604a3b54940] FrameRateExtD: 1; FrameRateExtN: 30

[h264_qsv @ 0x5604a3b54940] IntRefType: 0; IntRefCycleSize: 0; IntRefQPDelta: 0
[h264_qsv @ 0x5604a3b54940] MaxFrameSize: 230400; MaxSliceSize: 0
[h264_qsv @ 0x5604a3b54940] BitrateLimit: ON; MBBRC: OFF; ExtBRC: OFF
[h264_qsv @ 0x5604a3b54940] Trellis: auto
[h264_qsv @ 0x5604a3b54940] RepeatPPS: OFF; NumMbPerSlice: 0; LookAheadDS: 2x
[h264_qsv @ 0x5604a3b54940] AdaptiveI: OFF; AdaptiveB: OFF; BRefType:off
[h264_qsv @ 0x5604a3b54940] MinQPI: 0; MaxQPI: 0; MinQPP: 0; MaxQPP: 0; MinQPB: 0; MaxQPB: 0
[h264_qsv @ 0x5604a3b54940] DisableDeblockingIdc: 0
[h264_qsv @ 0x5604a3b54940] SkipFrame: no_skip
[h264_qsv @ 0x5604a3b54940] PRefType: default
[h264_qsv @ 0x5604a3b54940] TransformSkip: unknown
[h264_qsv @ 0x5604a3b54940] IntRefCycleDist: 1
[h264_qsv @ 0x5604a3b54940] LowDelayBRC: OFF
[h264_qsv @ 0x5604a3b54940] MaxFrameSizeI: 0; MaxFrameSizeP: 0
[h264_qsv @ 0x5604a3b54940] ScenarioInfo: 2
Output #0, h264, to 'udp://127.0.0.1:20001':
Metadata:
encoder : Lavf60.3.100
Stream #0:0: Video: h264, 1 reference frame, nv12(tv, progressive), 640x480 (0x0), q=2-31, 1000 kb/s, 30 fps, 30 tbn
Metadata:
encoder : Lavc60.3.100 h264_qsv
Side data:
cpb: bitrate max/min/avg: 0/0/1000000 buffer size: 0 vbv_delay: N/A
frame= 1835 fps= 30 q=15.0 size= 7223kB time=00:01:01.13 bitrate= 968.0kbits/s speed=0.999x

-------------------------------------------------------------------------------------------------

 

 

 

 

 

0 Kudos
7 Replies
AlekhyaV_Intel
Moderator
3,462 Views

Hi,


Thank you for posting in Intel Communities. We are working on this internally and we will get back to you with an update.


Thanks,

Alekhya




0 Kudos
AlekhyaV_Intel
Moderator
3,317 Views

Hi,

 

We apologize for the delay caused. We were able to encode a webcam video to UDP packet with & without Quick-Sync just like you did as follows.

Software Encoder:

AlekhyaV_Intel_0-1692886526106.png

 

Hardware Encoder:

AlekhyaV_Intel_1-1692886539161.png

 

To understand your issue better, we would like to know how you're calculating the latency and some more information:

  1. Kernel version & processor details in which you're trying to reproduce this issue.
  2. Could you please let us know the steps/formula you used to calculate latency from webcam to displaying on Monitor.
  3. One more quick doubt on this. Did you mean an external device(example: monitor, tv, etc.) when you meant displaying on monitor? Or did you mean on your computer/laptop screen?

 

Regards,

Alekhya

 

0 Kudos
cglee
Beginner
3,287 Views

Hi,

 

1. How to measure the latency

1.1 I executed a simple timer application which displays current timestamp on my monitor.

1.2 I captured the timestamp by my webcam and displayed the captured picture on my monitor too.

1.3 And, I took a picture my phone camera both the timestamp and the captured picture.

1.4 I recognized the difference between the timestamp and another timestamp on the captured picture as  latency.

1.5 Please look at test_way.jpg file

 

2. Player

I used ffplay to display for udp packets. Following is the options I used.

~/ffmpeg_build/bin/ffplay -fflags nobuffer -flags low_delay -framedrop -vcodec h264 udp://127.0.0.1:20001
ffplay version 6.0 Copyright (c) 2003-2023 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/home/highvolt/ffmpeg_build pkg-config-flags=--static --extra-libs='-lpthread -lm' --ld=g++ --enable-gpl --enable-gnutls --enable-libfreetype --enable-libx264 --enable-libvpl --enable-libv4l2 --enable-nonfree
libavutil 58. 2.100 / 58. 2.100
libavcodec 60. 3.100 / 60. 3.100
libavformat 60. 3.100 / 60. 3.100
libavdevice 60. 1.100 / 60. 1.100
libavfilter 9. 3.100 / 9. 3.100
libswscale 7. 1.100 / 7. 1.100
libswresample 4. 10.100 / 4. 10.100
libpostproc 57. 1.100 / 57. 1.100

 

3. Information

3.1 My laptop

3.1.1 Model : Dell Inspiron 15 7570

3.1.2 CPU :  See cpu_info.txt

3.1.3 Memory : See mem_info.txt

3.1.4 GPU :

*-display
description: VGA compatible controller
product: UHD Graphics 620 (Whiskey Lake)
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:145 memory:a4000000-a4ffffff memory:80000000-8fffffff ioport:5000(size=64) memory:c0000-dffff
*-display
description: 3D controller
product: GP108M [GeForce MX150]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list rom
configuration: driver=nouveau latency=0
resources: irq:146 memory:a2000000-a2ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:4000(size=128) memory:a3000000-a307ffff

 

3.2 Devices

3.2.1 Monitor : Dell S2340Lc

3.2.2 Webcam : Logitech logi HD1080p

 

3.3 OS

Ubuntu 20.04
Linux version 5.15.0-79-generic (buildd@lcy02-amd64-014) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #86~20.04.2-Ubuntu SMP Mon Jul 17 23:27:17 UTC 2023

 

Please tell me anything I missed you need.

Thank you.

cglee

 

 

0 Kudos
AlekhyaV_Intel
Moderator
3,210 Views

Hi,


Thank you for sharing all the details that we have requested. We have contacted the admin team regarding this issue and would get back to you soon with an update.


Thanks,

Alekhya


0 Kudos
AlekhyaV_Intel
Moderator
2,995 Views

Hi,

 

We got an update from the admin team. According to the information, the output format from your camera is yuy2 / yuyv (yuyv422 in FFmpeg), however Intel HW (vme mode) doesn't support h264 encoding with yuy2/yuyv format, Intel HW (vme mode) supports NV12 only for h264 encoding, please refer to https://github.com/intel/media-driver/blob/master/docs/media_features.md#hardwarepak--shadermedia-kernelvme-encoding.

In addition, HW encoder accepts data in gfx memory, however the data from your camera is in system memory.

Some text with a title

Input #0, dshow, from 'video=USB Video Device': Duration: N/A, start: 31681.955346, bitrate: N/A Stream #0:0: Video: rawvideo, 1 reference frame (YUY2 / 0x32595559), yuyv422(tv, bt470bg/bt709/unknown, topleft), 640x480, 30 fps, 30 tbr, 10000k tbn

The command with HW encoder does 2 more things than the one with sw encoder:

  1. Convert yuy2 to nv12, which is done in FFmpeg
  2. Upload data from system memory to gfx memory, which is done in oneVPL GPU runtime.

 

So it is possible that the command with hw encoder is slow than the command with sw encoder (Note it is not hw encoder vs. sw encoder)

You may use hwupload and vpp_qsv to speed up the command with hw encoder, please refer to the command below

ffmpeg -y -init_hw_device qsv -loglevel verbose -f lavfi -i yuvtestsrc=size=640x480,format=yuyv422 -vf "hwupload=extra_hw_frames=16,vpp_qsv=format=nv12" -preset veryfast -scenario videoconference -async_depth 1 -int_ref_cycle_dist 1 -f h264 -c:v h264_qsv qsv.mp4

 

If this resolves your issue, make sure to accept this as solution. This helps others with similar queries.

 

Thanks,

Alekhya

 

0 Kudos
cglee
Beginner
2,967 Views

Thank you Alekhya for your reply.

 

During last several weeks, I have tried to use qsv(media sdk) directly without ffmpeg or libav.

Finally, I found a solution for reducing the latency. Your library may use a number of surface frames to which next frames refer. And your library should have 2 frames at least when encoding. It causes a latency for 2 frames time in mandatory, which is usually 66ms or more(30fps). There are no official options to remove the latency for the mandatory frames except only some tricking combinations.     

I referred this thread. I hope that it helps someone facing this latency problem.

https://community.intel.com/t5/Media-Intel-oneAPI-Video/h-264-decoder-gives-two-frames-latency-while-decoding-a-stream/m-p/1099706

 

Thank you

cglee

0 Kudos
AlekhyaV_Intel
Moderator
2,866 Views

Hi,


Glad to know that your issue is resolved. If you need any further assistance, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Alekhya


0 Kudos
Reply