I have encountered disappointing performance with the IPP v.8 libraries and the associated the codecs contained in the example source code (w_ipp_8.0.0.005_legacy_samples.zip) relative to IPP v.7.
Here is some data that I gathered to quantify my observations:
H.264 Encoder: -9%
H.264 Decoder: -33%
MPEG4 Encoder/Decoder: 0% (no change)
MJPEG Encoder: -100%
MJPEG Decoder: +20%
The H.264 and MJPEG Encoder results are very disappointing so I suspect that I might be doing something wrong here. Here are some details about my configuration:
If anyone can enlighten me as to why I am seeing such disappointing performance, please do so.
FWIW, when we upgraded from IPP v.6 to IPP v.7 we saw tremendous performance gains.
I went through this excercise a few months back without resolution.I had serious performance deficiencies with both 7.1 and 8. A 1080p H.264 encoding was only managing around 4 frames/second.
If you turned off openmp then one core was maxed out which is as expected. You are just not going to get real time performance in a single thread. With openmp on, you got 8 threads on my i7, but each was only running at approx 8%. It ended up being slower than the single threaded version! If it would have only used 80% of the cores, I would have had the performance required.
I have a theory that there is a bug in the sample code that causes a serious thread contention issue. But, as Intel have basically deprecated all that code, do not expect a fix. Their solution is to switch to the Media SDK. We have gone with x264.
Thanks for the reply, Robert.
A few questions:
I can't confirm the performance degradations you are reporting, at least for the three decoders and the JPEG encoder. Upgrading IPP from v. 7 (7.1.1) to v. 8 (8.0.1) seems to provide an almost similar performance in my measurements. There is a small improvement for AVX2 hardware but no practical degradations on any other optimization level (all numbers within realistic measurement limits i.e. +-1%). This is also on Windows using the static non-threaded libraries and without any additional threading applied and having confirmed that the proper dispatching occurs. This is not directly using the UMC samples so my numbers can perhaps more indicate that the primitives upgrade by it self does not add any degradations.I'm obviously using a specific test set but it is the same test set that I have used previously.
Upgrading from v. 6 (6.1.5) to v. 7 was a whole other story with a serious performance degradations (some up toa factor 2 slower!) seen on older hardware using SSE2/SSE3. This serious degradation has carried through into IPP v. 8 (Intel is aware of this). For more modern hardware a decent performance improvement (not huge, though) was seen going up to v. 7.
Performance measurement is a tricky beast...