Community
cancel
Showing results for 
Search instead for 
Did you mean: 
sheado
Beginner
240 Views

Slow H264 encoding - How to speed up?

Hello,

I know there's tons of post on this subject - but I think I've read all of them and I'm still stuck and now wondering if this is even possible. I also tried the recommended settings in posts such as: http://software.intel.com/en-us/articles/setting-h264-encoding-parameters-in-intel-ipp-media-process...

My goal is to encode 1280x720 at 30fps at >= 1mbps for live conferencing. I need the encode to take no longer than 33ms per frame.

No matter what settings I use (on H264EncoderParams) I can't get the encode time lower than 40ms (unless i sacrifice bitrate).

First...
I've tested on a dual core and a quad core (~2ghz) and the performance was about the same: >70% on either machine (with all cores working). Encode time ~40-60ms.

Then...
I found a post suggesting to remove OpenMP if you are using UMC from within a multithreaded environment (which I am). So I removed it and now the performance is: ~70% of just one core on either machine.

Problem:
Unfortunately though, the encode time is still ~40ms.

Is there anything that can be done here? Any changes to the UMC code perhaps (this almost seems like an artificial delay)? I have CPU cores just sitting there waiting to do the work - is it even possible to parallelize in a beneficial way at this point? Or is this the limit with this encoder?

Thanks.
0 Kudos
8 Replies
Chao_Y_Intel
Employee
240 Views

Hello,

For enabling encoder threading, the slide number need to be set to the threading number. Is the parameter be changed?

2 1 4 /* num_ref_frames (2-16), minimum length of list1 for backward prediction (only 1 is supported!), number of slices. */

Thanks,

Chao

sheado
Beginner
240 Views

Hi Chao,

Thanks for the response.

I tried 2 1 4 with no luck. I tried it with and without OpenMP. Here are some performance numbers:

With OpenMP:

Cpu0 : 62.9%us, 3.6%sy, 0.0%ni, 32.8%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu1 : 46.7%us, 2.9%sy, 0.0%ni, 50.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st

Cpu2 : 48.5%us, 3.3%sy, 0.0%ni, 46.9%id, 1.0%wa, 0.0%hi, 0.3%si, 0.0%st

Cpu3 : 48.9%us, 3.0%sy, 0.0%ni, 48.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

~70-120ms per frame

Without OpenMP

Cpu0 : 0.3%us, 1.3%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu1 : 82.5%us, 0.3%sy, 0.0%ni, 17.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu2 : 3.6%us, 1.0%sy, 0.0%ni, 95.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu3 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

~45ms per frame

Again my goal is to get it down to 32ms per frame.


I found the following article regarding OpenMP thread block issues under certain circumstances:
http://software.intel.com/en-us/articles/high-cpu-usage-and-intel-ipp-threaded-function/
Following their advice I set KMP_BLOCKTIME to 0, but did not see any improvements.

I also found this forum post (with accepted answer) saying that OpenMP does not work within a threaded environment: http://software.intel.com/en-us/forums/showthread.php?t=73301&o=a&s=lr
Recommendation is to disable threading.

Despite all this, I have a feeling the slow frame compression time has nothing to do with CPU ability. Even when running on a single core I'm getting less than 100% utilization. That tells me that there's something in the UMC H264 encoder that's sleeping, buffering, or lagging an unecessarily long amount.

Can anybody who knows the UMC code well point me at a part of the code where I can make the necessary changes to suit my needs?

Thanks!

Chao_Y_Intel
Employee
240 Views

Hi,

Are you testing with the umc_video_enc_con application with the sample code? I agree this may not be related to OpenMP threading, since it even cannot take the usage of the single core. Also, how are you reading the input data file, will it potentially have any IO problem there?

Thanks,

Chao

sheado
Beginner
240 Views

Hi Chao,

My video IO is not blocking. If I test just with camera preview I can get 30fps at less than 10% CPU.

I am testing with my own code and the main slow down is observed where I call:
UMC::H264VideoEncoder.GetFrame( in, out );

I'm using clock() around the call to GetFrame() to measure the elapsed time, and then I average the results over 60 frames. The average time GetFrame() takes to complete is ~40-60ms.

Is there anything I can do? Or is the UMC H264 encoder not designed for realtime HD encoding?

Thank You.
jacobh
Beginner
240 Views

Hello Sheado,

Did you get any further with this issue ?

I having the same problem and wondered if you have gotten any further.

Best regards,

Jacob
Emmanuel_W_
New Contributor I
240 Views

Hi,

I am not sure there are much you can do without modifying the mode decision/motion estimation algorithm in the encoder itself (there are several shortcuts that can be taken the lower the encode quality a bit but decrease the encoding time).

There are two steps that I can think of that can improve the performance a bit without too much work.

Check if you can get I420 or YV12 directly out of the device to avoid having to do any color conversion in the encoder.

In H264CoreEncoder_Init, you should be able to disable the ANALYSE_FRAME_TYPE flag. This is use for scene cut detection that you probably don't need if the video comes from a capture device. If remember correctly this use to take a little chunk of CPU.

Emmanuel
sheado
Beginner
240 Views

Hi Jacob & Emmanuel,

i pretty much gave up on this and moved on to other encoders. When i was testing i did use YV12, but I never tried disabling ANALYSE_FRAME_TYPE. After sifting through the forums it seemed to me like this will require changes to the encoder to get the performance improvement we're looking for.

Sheado
Chao_Y_Intel
Employee
240 Views

Hi Jacob and Sheado,

The test results showed that the threaded version of encoder is much slower than the non-threaded version.

With OpenMP:
~70-120ms per frame

Without OpenMP
~45ms per frame

This does not look to be expected.

Could you test with IPP sample application 'umc_video_enc_con', and check if it produce similar performance result? If the performance is different, the problem may be related to the usage of the Codec.

If umc_video_enc_con application also produce such result, possibly you can provide the encoding par file, and bit stream, so we can have a check about the problem.

Thanks,
Chao

Reply