Media (Intel® oneAPI Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

FFmpeg quicksync video scaling


CPU: Intel Xeon E3-1585 V5
GPU: Intel Iris Pro P580

I've successfully set up quicksync and ffmpeg and then started to work on using both for transcoding. When I run a simple transcode command, it processes with speed around 12.3x while using quicksync. When using libx264, the same command hangs around 9x speed.
Above mentioned command: "ffmpeg -y -i test_file -init_hw_device qsv:hw -c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -strict -2 -movflags faststart -b:v 3400k output.mp4"

All is fine up to this point, as quicksync should be faster than libx264. However, when I try scaling and padding video filters, the difference between quicksync and libx264 becomes very small and transcoding speed using quicksync drops down to 1.84x (from 12.3x). I should note that the initial input file is already in h264. Perhaps the difference would be much bigger if the input file was in a different codec. Not decoding with quicksync right now (besides, it doesn't really make a difference on this input file, since it's already in h264).
I've tested transcoding a single file into multiple outputs (each output has a different scale and pad filter) using both quicksync and libx264 - this is the main reason I am using ffmpeg along with qsv for. The difference is almost negligible, as libx264 consumes around 10% extra CPU but takes 5 seconds less. This difference is further halved, if I run ffmpeg as several parallel processes. I assume this is because scaling calculations are done by the CPU instead of the GPU. I was wondering if there exists a way to utilize hardware acceleration even more and make transcoding with scaling using quicksync faster, as I don't really see the point of using it with ffmpeg otherwise.

I did check the graphics utilization, just like the pdf suggested. Graphics card is idling during libx264 transcode and is used(render usage is more than 0 along with others) when transcoding with quicksync, so it does work.

The command I was using for most of the testing (scaling and padding), only replaced video codec with libx264 and remove init_hw_device when not using quicksync:
ffmpeg -y -init_hw_device qsv:hw -i test_file -c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -strict -2 -movflags faststart -vf "scale=iw*min(1920/iw\,1080/ih):ih*min(1920/iw\,1080/ih), pad=1920:1080:(1920-iw*min(1920/iw\,1080/ih))/2:(1080-ih*min(1920/iw\,1080/ih))/2" -b:v 3400k _01.mp4 \
-c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -strict -2 -movflags faststart -vf "scale=iw*min(1280/iw\,720/ih):ih*min(1280/iw\,720/ih), pad=1280:720:(1280-iw*min(1280/iw\,720/ih))/2:(720-ih*min(1280/iw\,720/ih))/2" -b:v 1725k _02.mp4 \
-c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -strict -2 -movflags faststart -vf "scale=iw*min(854/iw\,480/ih):ih*min(854/iw\,480/ih), pad=854:480:(854-iw*min(854/iw\,480/ih))/2:(480-ih*min(854/iw\,480/ih))/2" -b:v 960k _03.mp4 \
-c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -strict -2 -movflags faststart -vf "scale=iw*min(640/iw\,360/ih):ih*min(640/iw\,360/ih), pad=640:360:(640-iw*min(640/iw\,360/ih))/2:(360-ih*min(640/iw\,360/ih))/2" -b:v 510k _04.mp4 \
-c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -strict -2 -movflags faststart -vf "scale=iw*min(426/iw\,240/ih):ih*min(426/iw\,240/ih), pad=426:240:(426-iw*min(426/iw\,240/ih))/2:(240-ih*min(426/iw\,240/ih))/2" -b:v 320k _05.mp4 \
-c:v h264_qsv -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -c:a aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -strict -2 -movflags faststart -vf "scale=iw*min(284/iw\,160/ih):ih*min(284/iw\,160/ih), pad=284:160:(284-iw*min(284/iw\,160/ih))/2:(160-ih*min(284/iw\,160/ih))/2" -b:v 160k _06.mp4

Thanks in advance,


0 Kudos
5 Replies

what version of ffmpeg do you use for such runs?

can you try without pad filter but with scale?


Yea, I've thought about giving gstreamer a try, I'll keep it in my mind if my ffmpeg trick doesn't work.

Tried to check ffmpeg version with "ffmpeg -version", got these chars: N-86977-g5859b5b (no idea which version these correspond to)

I've done some thinking since the original post and came up with an idea. My intention is to pad and scale 1 video into 6 different outputs with different resolutions, but the aspect ratio is kept the same and padding is adjusted accordingly, thus the videos exactly the same, except scaling is different. That means I can decode and pad the input file with FFmpeg, then pipe raw video data to the sample_multi_transcode process, which then scales and encodes the videos.

Any idea how I would make the SAMPLE_MULTI_TRANSCODE sample (comes with the Intel Media Server) take a pipe as an input file to transcode? Pipes mean no extra reading large raw files from disk and using raw as input codec also means no decode for quicksync.
I really wanna use this sample (sample_multi_transcode), as I don't feel like writing my own (mostly due to time it would take and all the potential bugs when writing my own) and since it is blazingly fast compared to using quicksync with ffmpeg.

Edit: Manage to figure out how to pipe from ffmpeg to the sample using named pipes. 2 problems tho: the SAMPLE_MULTI_TRANSCODE doesn't have an option for a raw input file; and the second problem is that I want to have 5 outputs - if I run all 5 at the same time, each takes 1/5 of the data streamed through the pipe and I end up with the input video split into 5, each one only having a fifth of the original duration. Any idea if this is the right approach? Or would it be better if I simply transcoded the input with ffmpeg and a very large bitrate and ouput to file which would feed the quicksync process.


the problem is that you are using the software scaler 'scale' , if you want to keep the speed you must use the scaler that use quick sync: scale_qsv
But I'm not sure that it supports the same syntax as the software scaler.


Hi Tomaz

This is very interesting!

I have the exact same situation here with the "scale" feature vía CPU and not GPU

Have you found a solution for this problem or a workaround?