Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

h264 Encoder in 7.1 slower than in 7.0

manca1
Beginner
426 Views

Hi there,

I ported my H264 Encoder to IPP v7.1 using the samples that I built as dynamic multithreaded libraries and now the performance is dropped on H264 encoding with the same settings as in v.7.0.

Are you aware of the issues with h264 enc in 7.1?

I set m_iThreads to 0, and initialized the encoder correctly as well as the best ipp libs for my CPU using ippInit(). All my cores are used 100% when I am encoding something, but it's around 50-60% slower than it is when I do the same encoding with IPP v7.0 encoder. I am using separate threaded libs (which I download from the site). It's important to note that in v7.0, the usage of cores is around 40%, but still performs way faster than v7.1.

I tried v7.1 single threaded umc libs, as well, but they are way slower than the multithreaded ones.

Any help is really appreciated.

Thanks.

0 Kudos
25 Replies
manca1
Beginner
353 Views

anyone? Intel?

Sergey_K_Intel
Employee
353 Views

Hi,

Could you try to link your application to single thread IPP libraries? It may happen that thread oversubscription takes place.

Regards,
Sergey 

manca1
Beginner
353 Views

Hi,

I tried building single threaded libs and using them in the application. It works, but it's way slower than with multithreaded ones. We tested it with various video formats and different resolutions. We seem to get for every video, no matter how big it is, around 20 seconds difference between v7.0 and v7.1. Didn't you change something in h264 encoder itself that could cause this behavior?

Thanks

Sergey_K_Intel
Employee
353 Views

The last issue regarding H.264 that has been solved was excessive CPU load during playback. So, we haven't seen encoding performance issues. Could you provide us with specifics of your encoding parameters: resolution, profile, bitrate and, of course, CPU model? We'll make experiments locally. By the way, it is your own application. Can you reproduce the low performance results with "umc_video_enc_con" ?

Regards,
Sergey 

manca1
Beginner
353 Views

I am using IPP's static threaded libraries (not dynamic ones). I downloaded them from the website. umc_video_enc_con uses dynamic libs as I can see. Maybe that's the problem? umc_video_enc_con application utilizes CPU to the maximum 100% all four cores (cpu is: i5 3570) but the encoding speed is twice as fast as with our application. I suspect something is wrong with the libraries along the line.

Do you have any advice what should I try?

Oh, and the parameters we're using are:

res: 480p
profile: UMC::H264_PROFILE_MAIN
bitrate: 3000kbps
rate controls method:  UMC::H264_RCM_CBR

Thanks...

 

Sergey_K_Intel
Employee
353 Views

First of all, you need to make sure that you use the proper optimized library. With static linking it is done by calling ippInit() function somewhere at the beginning of application. With dynamic linking it does not matter, because ippInit() is called by DllMain function. Then, as far as I see from umc_video_enc_con it sets number of internal IPP's threads to 1 by ippSetNumThreads. Some of video encoding functions (as long as video post/pre-processing functions) still use internal threading (by OpenMP), which brings no good if are used in externally threaded application (as H.264 encoder). So, to limit internal threading to single ippSetNumThreads(1) is used.

So, try to call ippInit() and ippSetNumThreads(1) in application initialization phase.

Regards,
Sergey

By the way, you can build umc_video_enc_con with any type of IPP library using options in IPP samples build script.

manca1
Beginner
353 Views

I am using ippInit(), as well as ippSetNumThreads(1)... it's still the same.

manca1
Beginner
353 Views

 Also, I don't understand the difference between dynamic and static libraries of samples? When I build both I get .libs in both cases, even though for dynamic I expected to see dlls. It was like that in previous version of IPP. So what does exactly mean dynamic and static in build options for the IPP samples? I tried with both versions, and they give the same results and the same encoding time. The only difference I managed to get is with single threaded versions which performed way slower than the multithreaded ones.

manca1
Beginner
353 Views

So, any more suggestions?

Sergey_K_Intel
Employee
353 Views

The terms "dynamic" and "static" in samples refer to which IPP libraries will be used during link. Dynamic libs (DLLs or .so) or static (.lib or .a). These terms don't relate to intermediate sample libraries which are generated during sample application build. Thus in both cases you will get static libraries (codecs, muxers, whatever).

Then, in IPP 7.1 UMC samples H.264 encoder can be parallel. Its parallelization is done by OpenMP. When you select "mt" libraries during sample build, the script does two things - it defines USE_OPENMP macro (which masks OpenMP constructs in codec. #ifdef USE_OPENMP etc.) and it puts multi-thread IPP libraries (*_t kind of them) to linker command line. So, basically there are two levels of parallelization - codec-level and function-level. It has been seen that function-level paralellization brings no additional performance benefit when external (upper-level) parallelization is active. You can manually modify linker input files from *_t libs to *_l (lowercase L) libs and will see no difference in performance. Thus, your goal should be enabling codec-level parallelization and disabling function-level (set numthreads to 1).

There are options in command line to umc_video_enc_con for both codec-level (-t <num>) and function-level (--ipp_threads <num>) number of threads. You can simulate your multi-thread encoding environment with this sample.
The command line should be like
umc_video_enc_con -c h264 -i <source>.yuv  -o <dest>.h264 -b 3000000 -r 720 480 -t <num_external> --ipp_threads <num_internal>

I see extra CPU load even during single-thread encoding. It needs to be investigated.

Regards,
Sergey 

manca1
Beginner
353 Views

So, let me know whether there will be a fix for this soon, or should I revert to IPP v7.0?

Sergey_K_Intel
Employee
353 Views

Hi,

To lower CPU loading add the following lines to the file umc_h264_core_enc.cpp at line ~2186

[cpp]
    if (core_enc->m_params.num_slices > nMB)
        core_enc->m_params.num_slices = (Ipp16s)IPP_MIN(nMB, 0x7FFF);
    if (core_enc->m_params.num_slices < core_enc->m_params.m_iThreads)
        core_enc->m_params.m_iThreads = core_enc->m_params.num_slices;
// These lines should be added
#ifdef USE_OPENMP
    omp_set_num_threads(core_enc->m_params.m_iThreads);
#endif
//
    switch (core_enc->m_params.level_idc)
[/cpp]

Regards,
Sergey 

manca1
Beginner
353 Views

I tried this, but it's still the same. Even with iThreads to 1 in encoder, the load goes from 80 to 100% on all 4 cores.

With iThreads to 0, the load is between 70-90%, but the time needed to encode the file is the same as it was before adding the lines you suggested. I am testing with dynamic_mt libraries - those are the ones I built with the modified source.

Any more suggestions?

manca1
Beginner
353 Views

Sergey, should I go with revert?

manca1
Beginner
353 Views

I noticed that with reverted code to v7.0 AND the libiomp5md.dll FROM v7.1 it works equally slow!! BUT, with libiomp5md.dll from v7.0 it works as expected! Do you have any ideas about this? Maybe there's a bug in libiomp5md or I am not using it correctly (in v7.1).

Sergey_K_Intel
Employee
353 Views

As far as i remember there was problem with this (though, quite long ago), but "Intel compiler" forum knows better. This is their area.
Though, it deserves small test with #pragma omp. Thank you for finding this. We will check.

Regards,
Sergey 

manca1
Beginner
353 Views

The funny thing is that libiomp5md from v7.0 works with v7.1 libs as well :D Crazy...

Please report back once you fix it!

manca1
Beginner
353 Views

Any updates on this?

Sergey_K_Intel
Employee
353 Views

Hi,

I have just spoken with OpenMP support guys. They know nothing about this problem. If we could create a small reproducer for the problem, we can make them move :). Meanwhile, could you provide with version numbers of "good" and "bad" libiomp5md.dll files. Just in explorer right click on this file and look at Properties/Details tab. On my computer I see file version 5.0.2012.1207 for example.

Regards,
Sergey 

manca1
Beginner
328 Views

Hi, thanks for the update.

The one that works well has this version: 5.0.2011.606.

The one that doesn't work well has this version: 5.0.2012.914

Does it ring any bells?

Reply