- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have been doing some performance testing with your encode and decode samples, and comparing results with the same operations done on the same system using FFMPEG.
Encoding is significantly faster - 4x that of FFMPEG, while only using 10% of the CPU.
Decoding - FFMPEG wins. Overall, it takes twice as long to do a decode using the Intel GPU as it does using FFMPEG. Would you have any ideas why that might be? I might guess that the large amount of YUV data produced is bottlenecked returning from GPU to main data bus.
System is a Supero X10SLH-F. E3-1285-v3 Xeon and P4700 GPU with C226 chipset. 8 virtual cores, 3.6 GHz 12 GB memory. Open source computer animation “Big Buck Bunny” used. H.264, 720p, ~ 10 minutes in length. OS is CentOS 7.
Here's the decode: ./sample_decode_drm h264 -i big-buck-bunny_294_1280x720.h264 -o /dev/null -hw
The output is thrown away to minimize any I/O delays in writing out the YUV data.
Thanks.
John
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi john,
To measure the pure decoding performance, simply remove the "-o /dev/null".
With /dev/null, if you noticed the sample_decode output, you would see non-zero fwrite_fps and the decode fps is usually the same as fwrite_fps (I/O bottlenecked). If you remove the "-o" option, the fwrite_fps is 0 and dcode fps is a very high number.
Try this command: ./sample_decode_drm h264 -i big-buck-bunny_294_1280x720.h264
Let us know what you see.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that makes the difference. Now the numbers are comparable to encode.
The elapsed GPU times are about 70% of the FFMPEG times, doing runs for 1,5,10,15 and 20 simultaneous decodes.
And the host CPU use is at 4, 8, 10, 12 and 15% for the decodes above. FFMPEG runs at 99%+ for 5 and above.
Thanks you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad we got this sorted out. Keep us posted of your evaluation. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you know which codec you were using when do the benchmark? Was it x264?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes. This is the FFMPEG codec:
DEV.LS h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (encoders: libx264 libx264rgb )
Just to have you do a sanity check on my GPU results, here's what I have. I'm using the well-know Big Buck Bunny video at 720p - - big-buck-bunny_294_1280x720.mp4: ISO Media, MPEG v4 system, version 1
We are considering adding your GPU to our media server for use in live video conferencing. Thus, encoding/decoding need to be done in real time.
My test system is a Xeon - Intel(R) Xeon(R) CPU E3-1285 v3 @ 3.60GHz, 8 virtual cores, 12 GB memory.
I run a series of decode and then encode tests. Results below. (forgive the formatting, please)
INTEL GPU DECODE Overall Elapsed Time Elapsed Time per Session Average Host CPU Use (%)
1 Session 9 9 4
5 Simultaneous Sessions 49 9.8 8
10 Simultaneous Sessions 95 9.5 10.5
15 Simultaneous Sessions 144 9.6 12
20 Simultaneous Sessions 193 9.65 15
INTEL GPU ENCODE Elapsed Time Elapsed Time per Session Average Host CPU Use (%)
1 Session 66 66 3.6
5 Simultaneous Sessions 167 33.4 7.9
10 Simultaneous Sessions 339 33.9 8.7
15 Simultaneous Sessions 508 33.7 9.4
20 Simultaneous Sessions 674 33.7 9.8
So, if just encoding is done, it looks like 17 simultaneous sessions are the max that can be done to complete in 600 seconds, which is the length of the video being encoded. (10 minutes)
Decoding is better, with about 62 sessions potentially being done in real time.
For both decoding input into a conference and then encoding the output (the real scenario) I figure the GPU can support about 13 simultaneous sessions.
Given the system, does this sound about right?
Thanks.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John - The numbers for decode and encode are in line with what I am observing on my system (which is similar to yours). yes, the decoder is much much faster than the encoder (as expected). If you have more questions on performance, please send me a message.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This answer (http://stackoverflow.com/questions/20367326/which-lib-is-better-transcoder-for-live-camera-ffmpeg-vs-intel-media-sdk) here says that there is a trade-off between quality and CPU usage.
How significant is the quality drop for transcoding application during screen capture applications?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, if I use only "h264_qsv" option to ffmpeg, which license do I need? I only use ffmpeg to capture http streaming with h264_qsv enabled. What happens after 30 days with the Media Server Studio Community Edition2017?
Thank you
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page