Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

h264 Encoder in 7.1 slower than in 7.0

manca1
Beginner
907 Views

Hi there,

I ported my H264 Encoder to IPP v7.1 using the samples that I built as dynamic multithreaded libraries and now the performance is dropped on H264 encoding with the same settings as in v.7.0.

Are you aware of the issues with h264 enc in 7.1?

I set m_iThreads to 0, and initialized the encoder correctly as well as the best ipp libs for my CPU using ippInit(). All my cores are used 100% when I am encoding something, but it's around 50-60% slower than it is when I do the same encoding with IPP v7.0 encoder. I am using separate threaded libs (which I download from the site). It's important to note that in v7.0, the usage of cores is around 40%, but still performs way faster than v7.1.

I tried v7.1 single threaded umc libs, as well, but they are way slower than the multithreaded ones.

Any help is really appreciated.

Thanks.

0 Kudos
25 Replies
manca1
Beginner
739 Views

anyone? Intel?

0 Kudos
Sergey_K_Intel
Employee
739 Views

Hi,

Could you try to link your application to single thread IPP libraries? It may happen that thread oversubscription takes place.

Regards,
Sergey 

0 Kudos
manca1
Beginner
739 Views

Hi,

I tried building single threaded libs and using them in the application. It works, but it's way slower than with multithreaded ones. We tested it with various video formats and different resolutions. We seem to get for every video, no matter how big it is, around 20 seconds difference between v7.0 and v7.1. Didn't you change something in h264 encoder itself that could cause this behavior?

Thanks

0 Kudos
Sergey_K_Intel
Employee
739 Views

The last issue regarding H.264 that has been solved was excessive CPU load during playback. So, we haven't seen encoding performance issues. Could you provide us with specifics of your encoding parameters: resolution, profile, bitrate and, of course, CPU model? We'll make experiments locally. By the way, it is your own application. Can you reproduce the low performance results with "umc_video_enc_con" ?

Regards,
Sergey 

0 Kudos
manca1
Beginner
739 Views

I am using IPP's static threaded libraries (not dynamic ones). I downloaded them from the website. umc_video_enc_con uses dynamic libs as I can see. Maybe that's the problem? umc_video_enc_con application utilizes CPU to the maximum 100% all four cores (cpu is: i5 3570) but the encoding speed is twice as fast as with our application. I suspect something is wrong with the libraries along the line.

Do you have any advice what should I try?

Oh, and the parameters we're using are:

res: 480p
profile: UMC::H264_PROFILE_MAIN
bitrate: 3000kbps
rate controls method:  UMC::H264_RCM_CBR

Thanks...

 

0 Kudos
Sergey_K_Intel
Employee
739 Views

First of all, you need to make sure that you use the proper optimized library. With static linking it is done by calling ippInit() function somewhere at the beginning of application. With dynamic linking it does not matter, because ippInit() is called by DllMain function. Then, as far as I see from umc_video_enc_con it sets number of internal IPP's threads to 1 by ippSetNumThreads. Some of video encoding functions (as long as video post/pre-processing functions) still use internal threading (by OpenMP), which brings no good if are used in externally threaded application (as H.264 encoder). So, to limit internal threading to single ippSetNumThreads(1) is used.

So, try to call ippInit() and ippSetNumThreads(1) in application initialization phase.

Regards,
Sergey

By the way, you can build umc_video_enc_con with any type of IPP library using options in IPP samples build script.

0 Kudos
manca1
Beginner
739 Views

I am using ippInit(), as well as ippSetNumThreads(1)... it's still the same.

0 Kudos
manca1
Beginner
739 Views

 Also, I don't understand the difference between dynamic and static libraries of samples? When I build both I get .libs in both cases, even though for dynamic I expected to see dlls. It was like that in previous version of IPP. So what does exactly mean dynamic and static in build options for the IPP samples? I tried with both versions, and they give the same results and the same encoding time. The only difference I managed to get is with single threaded versions which performed way slower than the multithreaded ones.

0 Kudos
manca1
Beginner
739 Views

So, any more suggestions?

0 Kudos
Sergey_K_Intel
Employee
739 Views

The terms "dynamic" and "static" in samples refer to which IPP libraries will be used during link. Dynamic libs (DLLs or .so) or static (.lib or .a). These terms don't relate to intermediate sample libraries which are generated during sample application build. Thus in both cases you will get static libraries (codecs, muxers, whatever).

Then, in IPP 7.1 UMC samples H.264 encoder can be parallel. Its parallelization is done by OpenMP. When you select "mt" libraries during sample build, the script does two things - it defines USE_OPENMP macro (which masks OpenMP constructs in codec. #ifdef USE_OPENMP etc.) and it puts multi-thread IPP libraries (*_t kind of them) to linker command line. So, basically there are two levels of parallelization - codec-level and function-level. It has been seen that function-level paralellization brings no additional performance benefit when external (upper-level) parallelization is active. You can manually modify linker input files from *_t libs to *_l (lowercase L) libs and will see no difference in performance. Thus, your goal should be enabling codec-level parallelization and disabling function-level (set numthreads to 1).

There are options in command line to umc_video_enc_con for both codec-level (-t <num>) and function-level (--ipp_threads <num>) number of threads. You can simulate your multi-thread encoding environment with this sample.
The command line should be like
umc_video_enc_con -c h264 -i <source>.yuv  -o <dest>.h264 -b 3000000 -r 720 480 -t <num_external> --ipp_threads <num_internal>

I see extra CPU load even during single-thread encoding. It needs to be investigated.

Regards,
Sergey 

0 Kudos
manca1
Beginner
739 Views

So, let me know whether there will be a fix for this soon, or should I revert to IPP v7.0?

0 Kudos
Sergey_K_Intel
Employee
739 Views

Hi,

To lower CPU loading add the following lines to the file umc_h264_core_enc.cpp at line ~2186

[cpp]
    if (core_enc->m_params.num_slices > nMB)
        core_enc->m_params.num_slices = (Ipp16s)IPP_MIN(nMB, 0x7FFF);
    if (core_enc->m_params.num_slices < core_enc->m_params.m_iThreads)
        core_enc->m_params.m_iThreads = core_enc->m_params.num_slices;
// These lines should be added
#ifdef USE_OPENMP
    omp_set_num_threads(core_enc->m_params.m_iThreads);
#endif
//
    switch (core_enc->m_params.level_idc)
[/cpp]

Regards,
Sergey 

0 Kudos
manca1
Beginner
739 Views

I tried this, but it's still the same. Even with iThreads to 1 in encoder, the load goes from 80 to 100% on all 4 cores.

With iThreads to 0, the load is between 70-90%, but the time needed to encode the file is the same as it was before adding the lines you suggested. I am testing with dynamic_mt libraries - those are the ones I built with the modified source.

Any more suggestions?

0 Kudos
manca1
Beginner
739 Views

Sergey, should I go with revert?

0 Kudos
manca1
Beginner
739 Views

I noticed that with reverted code to v7.0 AND the libiomp5md.dll FROM v7.1 it works equally slow!! BUT, with libiomp5md.dll from v7.0 it works as expected! Do you have any ideas about this? Maybe there's a bug in libiomp5md or I am not using it correctly (in v7.1).

0 Kudos
Sergey_K_Intel
Employee
739 Views

As far as i remember there was problem with this (though, quite long ago), but "Intel compiler" forum knows better. This is their area.
Though, it deserves small test with #pragma omp. Thank you for finding this. We will check.

Regards,
Sergey 

0 Kudos
manca1
Beginner
739 Views

The funny thing is that libiomp5md from v7.0 works with v7.1 libs as well :D Crazy...

Please report back once you fix it!

0 Kudos
manca1
Beginner
739 Views

Any updates on this?

0 Kudos
Sergey_K_Intel
Employee
739 Views

Hi,

I have just spoken with OpenMP support guys. They know nothing about this problem. If we could create a small reproducer for the problem, we can make them move :). Meanwhile, could you provide with version numbers of "good" and "bad" libiomp5md.dll files. Just in explorer right click on this file and look at Properties/Details tab. On my computer I see file version 5.0.2012.1207 for example.

Regards,
Sergey 

0 Kudos
manca1
Beginner
714 Views

Hi, thanks for the update.

The one that works well has this version: 5.0.2011.606.

The one that doesn't work well has this version: 5.0.2012.914

Does it ring any bells?

0 Kudos
Reply