I ported my H264 Encoder to IPP v7.1 using the samples that I built as dynamic multithreaded libraries and now the performance is dropped on H264 encoding with the same settings as in v.7.0.
Are you aware of the issues with h264 enc in 7.1?
I set m_iThreads to 0, and initialized the encoder correctly as well as the best ipp libs for my CPU using ippInit(). All my cores are used 100% when I am encoding something, but it's around 50-60% slower than it is when I do the same encoding with IPP v7.0 encoder. I am using separate threaded libs (which I download from the site). It's important to note that in v7.0, the usage of cores is around 40%, but still performs way faster than v7.1.
I tried v7.1 single threaded umc libs, as well, but they are way slower than the multithreaded ones.
Any help is really appreciated.
I tried building single threaded libs and using them in the application. It works, but it's way slower than with multithreaded ones. We tested it with various video formats and different resolutions. We seem to get for every video, no matter how big it is, around 20 seconds difference between v7.0 and v7.1. Didn't you change something in h264 encoder itself that could cause this behavior?
The last issue regarding H.264 that has been solved was excessive CPU load during playback. So, we haven't seen encoding performance issues. Could you provide us with specifics of your encoding parameters: resolution, profile, bitrate and, of course, CPU model? We'll make experiments locally. By the way, it is your own application. Can you reproduce the low performance results with "umc_video_enc_con" ?
I am using IPP's static threaded libraries (not dynamic ones). I downloaded them from the website. umc_video_enc_con uses dynamic libs as I can see. Maybe that's the problem? umc_video_enc_con application utilizes CPU to the maximum 100% all four cores (cpu is: i5 3570) but the encoding speed is twice as fast as with our application. I suspect something is wrong with the libraries along the line.
Do you have any advice what should I try?
Oh, and the parameters we're using are:
rate controls method: UMC::H264_RCM_CBR
First of all, you need to make sure that you use the proper optimized library. With static linking it is done by calling ippInit() function somewhere at the beginning of application. With dynamic linking it does not matter, because ippInit() is called by DllMain function. Then, as far as I see from umc_video_enc_con it sets number of internal IPP's threads to 1 by ippSetNumThreads. Some of video encoding functions (as long as video post/pre-processing functions) still use internal threading (by OpenMP), which brings no good if are used in externally threaded application (as H.264 encoder). So, to limit internal threading to single ippSetNumThreads(1) is used.
So, try to call ippInit() and ippSetNumThreads(1) in application initialization phase.
By the way, you can build umc_video_enc_con with any type of IPP library using options in IPP samples build script.
Also, I don't understand the difference between dynamic and static libraries of samples? When I build both I get .libs in both cases, even though for dynamic I expected to see dlls. It was like that in previous version of IPP. So what does exactly mean dynamic and static in build options for the IPP samples? I tried with both versions, and they give the same results and the same encoding time. The only difference I managed to get is with single threaded versions which performed way slower than the multithreaded ones.
The terms "dynamic" and "static" in samples refer to which IPP libraries will be used during link. Dynamic libs (DLLs or .so) or static (.lib or .a). These terms don't relate to intermediate sample libraries which are generated during sample application build. Thus in both cases you will get static libraries (codecs, muxers, whatever).
Then, in IPP 7.1 UMC samples H.264 encoder can be parallel. Its parallelization is done by OpenMP. When you select "mt" libraries during sample build, the script does two things - it defines USE_OPENMP macro (which masks OpenMP constructs in codec. #ifdef USE_OPENMP etc.) and it puts multi-thread IPP libraries (*_t kind of them) to linker command line. So, basically there are two levels of parallelization - codec-level and function-level. It has been seen that function-level paralellization brings no additional performance benefit when external (upper-level) parallelization is active. You can manually modify linker input files from *_t libs to *_l (lowercase L) libs and will see no difference in performance. Thus, your goal should be enabling codec-level parallelization and disabling function-level (set numthreads to 1).
There are options in command line to umc_video_enc_con for both codec-level (-t <num>) and function-level (--ipp_threads <num>) number of threads. You can simulate your multi-thread encoding environment with this sample.
The command line should be like
umc_video_enc_con -c h264 -i <source>.yuv -o <dest>.h264 -b 3000000 -r 720 480 -t <num_external> --ipp_threads <num_internal>
I see extra CPU load even during single-thread encoding. It needs to be investigated.
To lower CPU loading add the following lines to the file umc_h264_core_enc.cpp at line ~2186
if (core_enc->m_params.num_slices > nMB)
core_enc->m_params.num_slices = (Ipp16s)IPP_MIN(nMB, 0x7FFF);
if (core_enc->m_params.num_slices < core_enc->m_params.m_iThreads)
core_enc->m_params.m_iThreads = core_enc->m_params.num_slices;
// These lines should be added
I tried this, but it's still the same. Even with iThreads to 1 in encoder, the load goes from 80 to 100% on all 4 cores.
With iThreads to 0, the load is between 70-90%, but the time needed to encode the file is the same as it was before adding the lines you suggested. I am testing with dynamic_mt libraries - those are the ones I built with the modified source.
Any more suggestions?
I noticed that with reverted code to v7.0 AND the libiomp5md.dll FROM v7.1 it works equally slow!! BUT, with libiomp5md.dll from v7.0 it works as expected! Do you have any ideas about this? Maybe there's a bug in libiomp5md or I am not using it correctly (in v7.1).
As far as i remember there was problem with this (though, quite long ago), but "Intel compiler" forum knows better. This is their area.
Though, it deserves small test with #pragma omp. Thank you for finding this. We will check.
I have just spoken with OpenMP support guys. They know nothing about this problem. If we could create a small reproducer for the problem, we can make them move :). Meanwhile, could you provide with version numbers of "good" and "bad" libiomp5md.dll files. Just in explorer right click on this file and look at Properties/Details tab. On my computer I see file version 5.0.2012.1207 for example.