Media (Intel® Video Processing Library, Intel Media SDK)
Access community support with transcoding, decoding, and encoding in applications using media tools like Intel® oneAPI Video Processing Library and Intel® Media SDK
Announcements
The Intel Media SDK project is no longer active. For continued support and access to new features, Intel Media SDK users are encouraged to read the transition guide on upgrading from Intel® Media SDK to Intel® Video Processing Library (VPL), and to move to VPL as soon as possible.
For more information, see the VPL website.

Media SDK Decode vs. FFMPEG Decode

John_H_7
Beginner
1,739 Views

Hi,

I have been doing some performance testing with your encode and decode samples, and comparing results with the same operations done on the same system using FFMPEG.

Encoding is significantly faster - 4x that of FFMPEG, while only using 10% of the CPU.

Decoding - FFMPEG wins. Overall, it takes twice as long to do a decode using the Intel GPU as it does using FFMPEG. Would you have any ideas why that might be? I might guess that  the large amount of YUV data produced is bottlenecked returning from GPU to main data bus.

System is a Supero X10SLH-F. E3-1285-v3 Xeon and P4700 GPU with C226 chipset. 8 virtual cores, 3.6 GHz 12 GB memory. Open source computer animation “Big Buck Bunny” used. H.264, 720p, ~ 10 minutes in length. OS is CentOS 7.

Here's the decode:  ./sample_decode_drm  h264 -i big-buck-bunny_294_1280x720.h264 -o /dev/null -hw

The output is thrown away to minimize any I/O delays in writing out the YUV data.

Thanks.

John


 

0 Kudos
8 Replies
Sravanthi_K_Intel
1,739 Views

Hi john,

To measure the pure decoding performance, simply remove the "-o /dev/null".

With /dev/null, if you noticed the sample_decode output, you would see non-zero fwrite_fps and the decode fps is usually the same as fwrite_fps (I/O bottlenecked). If you remove the "-o" option, the fwrite_fps is 0 and dcode fps is a very high number.

Try this command:  ./sample_decode_drm  h264 -i big-buck-bunny_294_1280x720.h264

Let us know what you see.

0 Kudos
John_H_7
Beginner
1,739 Views

Yes, that makes the difference.  Now the numbers are comparable to encode.

The elapsed GPU times are about 70% of the FFMPEG times, doing runs for 1,5,10,15 and 20 simultaneous decodes.

And the host CPU use is at 4, 8, 10, 12 and 15% for the decodes above.  FFMPEG runs at 99%+ for 5 and above.

Thanks you.

0 Kudos
Sravanthi_K_Intel
1,739 Views

Glad we got this sorted out. Keep us posted of your evaluation. Thanks!

0 Kudos
Andrew_B_2
Beginner
1,739 Views

Do you know which codec you were using  when do the  benchmark?  Was it x264?

0 Kudos
John_H_7
Beginner
1,739 Views

Yes. This is the FFMPEG codec:

DEV.LS h264                 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (encoders: libx264 libx264rgb )

Just to have you do a sanity check on my GPU results, here's what I have.  I'm using the well-know Big Buck Bunny video at 720p - - big-buck-bunny_294_1280x720.mp4: ISO Media, MPEG v4 system, version 1

We are considering adding your GPU to our media server for use in live video conferencing. Thus, encoding/decoding need to be done in real time.

My test system is a Xeon -  Intel(R) Xeon(R) CPU E3-1285 v3 @ 3.60GHz, 8 virtual cores, 12 GB memory.

I run a series of decode and then encode tests. Results below. (forgive the formatting, please)

INTEL GPU  DECODE             Overall Elapsed Time   Elapsed Time per Session           Average Host CPU Use (%)                                 

1 Session                                      9             9             4

5 Simultaneous Sessions               49           9.8         8

10 Simultaneous Sessions              95           9.5         10.5

15 Simultaneous Sessions              144        9.6         12

20 Simultaneous Sessions              193        9.65       15

INTEL GPU ENCODE    Elapsed Time           Elapsed Time per Session               Average Host CPU Use (%)

1 Session                                         66           66           3.6

5 Simultaneous Sessions                167        33.4       7.9

10 Simultaneous Sessions              339        33.9       8.7

15 Simultaneous Sessions              508        33.7      9.4

20 Simultaneous Sessions              674        33.7       9.8

So, if just encoding is done, it looks like 17 simultaneous sessions are the max that can be done to complete in 600 seconds, which is the length of the video being encoded. (10 minutes)

Decoding is better, with about 62 sessions potentially being done in real time.

For both decoding input into a conference and then encoding the output (the real scenario) I figure the GPU can support about 13 simultaneous sessions.

Given the system, does this sound about right?

Thanks.

John

 

0 Kudos
Sravanthi_K_Intel
1,739 Views

Hi John - The numbers for decode and encode are in line with what I am observing on my system (which is similar to yours). yes, the decoder is much much faster than the encoder (as expected). If you have more questions on performance, please send me a message.

0 Kudos
Chintan_M_
Beginner
1,739 Views

This answer (http://stackoverflow.com/questions/20367326/which-lib-is-better-transcoder-for-live-camera-ffmpeg-vs-intel-media-sdk) here says that there is a trade-off between quality and CPU usage.

How significant is the quality drop for transcoding application during screen capture applications?

0 Kudos
Fabrizio
Beginner
1,739 Views

Hi, if I use only "h264_qsv" option to ffmpeg, which license do I need? I only use ffmpeg to capture http streaming with h264_qsv enabled. What happens after 30 days with the Media Server Studio Community Edition2017?

Thank you

0 Kudos
Reply