- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I ported my H264 Encoder to IPP v7.1 using the samples that I built as dynamic multithreaded libraries and now the performance is dropped on H264 encoding with the same settings as in v.7.0.
Are you aware of the issues with h264 enc in 7.1?
I set m_iThreads to 0, and initialized the encoder correctly as well as the best ipp libs for my CPU using ippInit(). All my cores are used 100% when I am encoding something, but it's around 50-60% slower than it is when I do the same encoding with IPP v7.0 encoder. I am using separate threaded libs (which I download from the site). It's important to note that in v7.0, the usage of cores is around 40%, but still performs way faster than v7.1.
I tried v7.1 single threaded umc libs, as well, but they are way slower than the multithreaded ones.
Any help is really appreciated.
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
anyone? Intel?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you try to link your application to single thread IPP libraries? It may happen that thread oversubscription takes place.
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried building single threaded libs and using them in the application. It works, but it's way slower than with multithreaded ones. We tested it with various video formats and different resolutions. We seem to get for every video, no matter how big it is, around 20 seconds difference between v7.0 and v7.1. Didn't you change something in h264 encoder itself that could cause this behavior?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last issue regarding H.264 that has been solved was excessive CPU load during playback. So, we haven't seen encoding performance issues. Could you provide us with specifics of your encoding parameters: resolution, profile, bitrate and, of course, CPU model? We'll make experiments locally. By the way, it is your own application. Can you reproduce the low performance results with "umc_video_enc_con" ?
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using IPP's static threaded libraries (not dynamic ones). I downloaded them from the website. umc_video_enc_con uses dynamic libs as I can see. Maybe that's the problem? umc_video_enc_con application utilizes CPU to the maximum 100% all four cores (cpu is: i5 3570) but the encoding speed is twice as fast as with our application. I suspect something is wrong with the libraries along the line.
Do you have any advice what should I try?
Oh, and the parameters we're using are:
res: 480p
profile: UMC::H264_PROFILE_MAIN
bitrate: 3000kbps
rate controls method: UMC::H264_RCM_CBR
Thanks...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, you need to make sure that you use the proper optimized library. With static linking it is done by calling ippInit() function somewhere at the beginning of application. With dynamic linking it does not matter, because ippInit() is called by DllMain function. Then, as far as I see from umc_video_enc_con it sets number of internal IPP's threads to 1 by ippSetNumThreads. Some of video encoding functions (as long as video post/pre-processing functions) still use internal threading (by OpenMP), which brings no good if are used in externally threaded application (as H.264 encoder). So, to limit internal threading to single ippSetNumThreads(1) is used.
So, try to call ippInit() and ippSetNumThreads(1) in application initialization phase.
Regards,
Sergey
By the way, you can build umc_video_enc_con with any type of IPP library using options in IPP samples build script.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using ippInit(), as well as ippSetNumThreads(1)... it's still the same.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, I don't understand the difference between dynamic and static libraries of samples? When I build both I get .libs in both cases, even though for dynamic I expected to see dlls. It was like that in previous version of IPP. So what does exactly mean dynamic and static in build options for the IPP samples? I tried with both versions, and they give the same results and the same encoding time. The only difference I managed to get is with single threaded versions which performed way slower than the multithreaded ones.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, any more suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The terms "dynamic" and "static" in samples refer to which IPP libraries will be used during link. Dynamic libs (DLLs or .so) or static (.lib or .a). These terms don't relate to intermediate sample libraries which are generated during sample application build. Thus in both cases you will get static libraries (codecs, muxers, whatever).
Then, in IPP 7.1 UMC samples H.264 encoder can be parallel. Its parallelization is done by OpenMP. When you select "mt" libraries during sample build, the script does two things - it defines USE_OPENMP macro (which masks OpenMP constructs in codec. #ifdef USE_OPENMP etc.) and it puts multi-thread IPP libraries (*_t kind of them) to linker command line. So, basically there are two levels of parallelization - codec-level and function-level. It has been seen that function-level paralellization brings no additional performance benefit when external (upper-level) parallelization is active. You can manually modify linker input files from *_t libs to *_l (lowercase L) libs and will see no difference in performance. Thus, your goal should be enabling codec-level parallelization and disabling function-level (set numthreads to 1).
There are options in command line to umc_video_enc_con for both codec-level (-t <num>) and function-level (--ipp_threads <num>) number of threads. You can simulate your multi-thread encoding environment with this sample.
The command line should be like
umc_video_enc_con -c h264 -i <source>.yuv -o <dest>.h264 -b 3000000 -r 720 480 -t <num_external> --ipp_threads <num_internal>
I see extra CPU load even during single-thread encoding. It needs to be investigated.
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, let me know whether there will be a fix for this soon, or should I revert to IPP v7.0?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
To lower CPU loading add the following lines to the file umc_h264_core_enc.cpp at line ~2186
[cpp]
if (core_enc->m_params.num_slices > nMB)
core_enc->m_params.num_slices = (Ipp16s)IPP_MIN(nMB, 0x7FFF);
if (core_enc->m_params.num_slices < core_enc->m_params.m_iThreads)
core_enc->m_params.m_iThreads = core_enc->m_params.num_slices;
// These lines should be added
#ifdef USE_OPENMP
omp_set_num_threads(core_enc->m_params.m_iThreads);
#endif
//
switch (core_enc->m_params.level_idc)
[/cpp]
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried this, but it's still the same. Even with iThreads to 1 in encoder, the load goes from 80 to 100% on all 4 cores.
With iThreads to 0, the load is between 70-90%, but the time needed to encode the file is the same as it was before adding the lines you suggested. I am testing with dynamic_mt libraries - those are the ones I built with the modified source.
Any more suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey, should I go with revert?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I noticed that with reverted code to v7.0 AND the libiomp5md.dll FROM v7.1 it works equally slow!! BUT, with libiomp5md.dll from v7.0 it works as expected! Do you have any ideas about this? Maybe there's a bug in libiomp5md or I am not using it correctly (in v7.1).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as i remember there was problem with this (though, quite long ago), but "Intel compiler" forum knows better. This is their area.
Though, it deserves small test with #pragma omp. Thank you for finding this. We will check.
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The funny thing is that libiomp5md from v7.0 works with v7.1 libs as well :D Crazy...
Please report back once you fix it!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any updates on this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have just spoken with OpenMP support guys. They know nothing about this problem. If we could create a small reproducer for the problem, we can make them move :). Meanwhile, could you provide with version numbers of "good" and "bad" libiomp5md.dll files. Just in explorer right click on this file and look at Properties/Details tab. On my computer I see file version 5.0.2012.1207 for example.
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for the update.
The one that works well has this version: 5.0.2011.606.
The one that doesn't work well has this version: 5.0.2012.914
Does it ring any bells?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page