Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

H.264 encoder extremely slow in 7.1.1

Robert_Jongbloed
Beginner
1,067 Views

I have just upgraded my rather old code from IPP 5.3 to 7.1.1. Turned out to be a huge job due to the API changes, but that's OK, it happens. And my code ended up being much smaller and cleaner as many features I had to try and emulate are now in the sample code.

My problem now is that I am getting very high CPU usage, and very slow frame rates, the two being, of course, closely related. First, I am using the "max slice size" option as I am trying to send RFC 3984 compliant packetisation mode zero RTP packets. Second, I am encoding a y4m file to minimise any possible interactions with cameras etc. Finally, I am using contant bit rate set to 2Mbps. My test code (effectively) takes a YUV420P frame from the file, feeds it through the codec, then splits it up into separate RTP/NALU's by searching for the start codes etc, and finishes by throwing away the result. The build is using VS2012, 32 bit.

A CIF sized image maxes out the single thread and yields 10fps. HD720 is about 4fps and HD1080 is about 2fps.

That is seriously non-linear for a start. The HD stuff, I can sort of understand eating CPU for breakfast, but really, CIF should be able to do 30fps, with time to spare. Even in one thread.

I have played with the num_slices and m_iThreads parameters, as well as resolutions and CBR bit rates, and nothing seems to makes a lot of difference.

Can anyone think of something I am doing wrong?

Oh yeah, this is on a realtively old i7, but I got 10 times this performance with my old code, and IPP 5.3, two years ago.

0 Kudos
21 Replies
Robert_Jongbloed
Beginner
87 Views

Sergey Kostrov wrote:

>>...I tried all those compiler flags and few more besides. No significant effect...

That looks very strange because in Debug Configurations applications always work slower. I'd like to give you a very small example and take a look at these two tests:

....

As you can see in Release Configuration tests are executed ~3.2x fatser and the code was the same.

I agree, the optimised version should be faster. So why isn't it?

As I think I mentioned earlier, I can see 8 threads started, each using 5% of the CPU. Which means it is spending a very large amount of time spinning it's wheels and not doing anything.

I really hoped someone would know what silly thing I might have done that causes a 10-fold decrease in speed,

0 Kudos
Reply