I have been doing some performance testing with your encode and decode samples, and comparing results with the same operations done on the same system using FFMPEG.
Encoding is significantly faster - 4x that of FFMPEG, while only using 10% of the CPU.
Decoding - FFMPEG wins. Overall, it takes twice as long to do a decode using the Intel GPU as it does using FFMPEG. Would you have any ideas why that might be? I might guess that the large amount of YUV data produced is bottlenecked returning from GPU to main data bus.
System is a Supero X10SLH-F. E3-1285-v3 Xeon and P4700 GPU with C226 chipset. 8 virtual cores, 3.6 GHz 12 GB memory. Open source computer animation “Big Buck Bunny” used. H.264, 720p, ~ 10 minutes in length. OS is CentOS 7.
Here's the decode: ./sample_decode_drm h264 -i big-buck-bunny_294_1280x720.h264 -o /dev/null -hw
The output is thrown away to minimize any I/O delays in writing out the YUV data.
To measure the pure decoding performance, simply remove the "-o /dev/null".
With /dev/null, if you noticed the sample_decode output, you would see non-zero fwrite_fps and the decode fps is usually the same as fwrite_fps (I/O bottlenecked). If you remove the "-o" option, the fwrite_fps is 0 and dcode fps is a very high number.
Try this command: ./sample_decode_drm h264 -i big-buck-bunny_294_1280x720.h264
Let us know what you see.
Yes, that makes the difference. Now the numbers are comparable to encode.
The elapsed GPU times are about 70% of the FFMPEG times, doing runs for 1,5,10,15 and 20 simultaneous decodes.
And the host CPU use is at 4, 8, 10, 12 and 15% for the decodes above. FFMPEG runs at 99%+ for 5 and above.
Yes. This is the FFMPEG codec:
DEV.LS h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (encoders: libx264 libx264rgb )
Just to have you do a sanity check on my GPU results, here's what I have. I'm using the well-know Big Buck Bunny video at 720p - - big-buck-bunny_294_1280x720.mp4: ISO Media, MPEG v4 system, version 1
We are considering adding your GPU to our media server for use in live video conferencing. Thus, encoding/decoding need to be done in real time.
My test system is a Xeon - Intel(R) Xeon(R) CPU E3-1285 v3 @ 3.60GHz, 8 virtual cores, 12 GB memory.
I run a series of decode and then encode tests. Results below. (forgive the formatting, please)
INTEL GPU DECODE Overall Elapsed Time Elapsed Time per Session Average Host CPU Use (%)
1 Session 9 9 4
5 Simultaneous Sessions 49 9.8 8
10 Simultaneous Sessions 95 9.5 10.5
15 Simultaneous Sessions 144 9.6 12
20 Simultaneous Sessions 193 9.65 15
INTEL GPU ENCODE Elapsed Time Elapsed Time per Session Average Host CPU Use (%)
1 Session 66 66 3.6
5 Simultaneous Sessions 167 33.4 7.9
10 Simultaneous Sessions 339 33.9 8.7
15 Simultaneous Sessions 508 33.7 9.4
20 Simultaneous Sessions 674 33.7 9.8
So, if just encoding is done, it looks like 17 simultaneous sessions are the max that can be done to complete in 600 seconds, which is the length of the video being encoded. (10 minutes)
Decoding is better, with about 62 sessions potentially being done in real time.
For both decoding input into a conference and then encoding the output (the real scenario) I figure the GPU can support about 13 simultaneous sessions.
Given the system, does this sound about right?
Hi John - The numbers for decode and encode are in line with what I am observing on my system (which is similar to yours). yes, the decoder is much much faster than the encoder (as expected). If you have more questions on performance, please send me a message.
This answer (http://stackoverflow.com/questions/20367326/which-lib-is-better-transcoder-for-live-camera-ffmpeg-vs...) here says that there is a trade-off between quality and CPU usage.
How significant is the quality drop for transcoding application during screen capture applications?
Hi, if I use only "h264_qsv" option to ffmpeg, which license do I need? I only use ffmpeg to capture http streaming with h264_qsv enabled. What happens after 30 days with the Media Server Studio Community Edition2017?