IPP v.8 vs. IPP v.7 Codec Performance

Rohit_S_2 · ‎10-21-2013

I have encountered disappointing performance with the IPP v.8 libraries and the associated the codecs contained in the example source code (w_ipp_8.0.0.005_legacy_samples.zip) relative to IPP v.7.

Here is some data that I gathered to quantify my observations:

H.264 Encoder: -9%

H.264 Decoder: -33%

MPEG4 Encoder/Decoder: 0% (no change)

MJPEG Encoder: -100%

MJPEG Decoder: +20%

The H.264 and MJPEG Encoder results are very disappointing so I suspect that I might be doing something wrong here. Here are some details about my configuration:

The data was measured on a Release build (default compiler optimization settings for a release build) using Microsoft Visual Studio 2010
The data was measured on a Sandy Bridge E (6 core/12 thread) workstation running 32-bit Windows 7 Professional.
I am using the MT IPP libraries.
I have configured the encoder/decoder for each codec to only use 1 thread.
I have disabled OpenMP
I verified that the IPP library was initialized such that the Sandy Bridge specific primitives (G9) are called.
The project is configured to use CPU dispatch to choose the most appropriate version of an IPP primitive at runtime depending on the host CPU.
The IPP libraries are statically linked in.

If anyone can enlighten me as to why I am seeing such disappointing performance, please do so.

FWIW, when we upgraded from IPP v.6 to IPP v.7 we saw tremendous performance gains.

Robert_Jongbloed · ‎10-21-2013

I went through this excercise a few months back without resolution.I had serious performance deficiencies with both 7.1 and 8. A 1080p H.264 encoding was only managing around 4 frames/second.

If you turned off openmp then one core was maxed out which is as expected. You are just not going to get real time performance in a single thread. With openmp on, you got 8 threads on my i7, but each was only running at approx 8%. It ended up being slower than the single threaded version! If it would have only used 80% of the cores, I would have had the performance required.

I have a theory that there is a bug in the sample code that causes a serious thread contention issue. But, as Intel have basically deprecated all that code, do not expect a fix. Their solution is to switch to the Media SDK. We have gone with x264.

Rohit_S_2 · ‎10-21-2013

Thanks for the reply, Robert.

A few questions:

Why didn't you use Intel's Media SDK?
Did x264 give you the performance you were looking for?
Does x264 take advantage of the newer SIMD instruction set extensions of x86/x64?

j_miles · ‎10-25-2013

I can't confirm the performance degradations you are reporting, at least for the three decoders and the JPEG encoder. Upgrading IPP from v. 7 (7.1.1) to v. 8 (8.0.1) seems to provide an almost similar performance in my measurements. There is a small improvement for AVX2 hardware but no practical degradations on any other optimization level (all numbers within realistic measurement limits i.e. +-1%). This is also on Windows using the static non-threaded libraries and without any additional threading applied and having confirmed that the proper dispatching occurs. This is not directly using the UMC samples so my numbers can perhaps more indicate that the primitives upgrade by it self does not add any degradations.I'm obviously using a specific test set but it is the same test set that I have used previously.

Upgrading from v. 6 (6.1.5) to v. 7 was a whole other story with a serious performance degradations (some up toa factor 2 slower!) seen on older hardware using SSE2/SSE3. This serious degradation has carried through into IPP v. 8 (Intel is aware of this). For more modern hardware a decent performance improvement (not huge, though) was seen going up to v. 7.

Performance measurement is a tricky beast...

- Jay