- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have just upgraded my rather old code from IPP 5.3 to 7.1.1. Turned out to be a huge job due to the API changes, but that's OK, it happens. And my code ended up being much smaller and cleaner as many features I had to try and emulate are now in the sample code.
My problem now is that I am getting very high CPU usage, and very slow frame rates, the two being, of course, closely related. First, I am using the "max slice size" option as I am trying to send RFC 3984 compliant packetisation mode zero RTP packets. Second, I am encoding a y4m file to minimise any possible interactions with cameras etc. Finally, I am using contant bit rate set to 2Mbps. My test code (effectively) takes a YUV420P frame from the file, feeds it through the codec, then splits it up into separate RTP/NALU's by searching for the start codes etc, and finishes by throwing away the result. The build is using VS2012, 32 bit.
A CIF sized image maxes out the single thread and yields 10fps. HD720 is about 4fps and HD1080 is about 2fps.
That is seriously non-linear for a start. The HD stuff, I can sort of understand eating CPU for breakfast, but really, CIF should be able to do 30fps, with time to spare. Even in one thread.
I have played with the num_slices and m_iThreads parameters, as well as resolutions and CBR bit rates, and nothing seems to makes a lot of difference.
Can anyone think of something I am doing wrong?
Oh yeah, this is on a realtively old i7, but I got 10 times this performance with my old code, and IPP 5.3, two years ago.
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey Kostrov wrote:
>>...I tried all those compiler flags and few more besides. No significant effect...
That looks very strange because in Debug Configurations applications always work slower. I'd like to give you a very small example and take a look at these two tests:
....
As you can see in Release Configuration tests are executed ~3.2x fatser and the code was the same.
I agree, the optimised version should be faster. So why isn't it?
As I think I mentioned earlier, I can see 8 threads started, each using 5% of the CPU. Which means it is spending a very large amount of time spinning it's wheels and not doing anything.
I really hoped someone would know what silly thing I might have done that causes a 10-fold decrease in speed,
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »