This issue is reproducible in the simple_player example. Just take an H.264 video clip and play it on in the simple_player with -t0 and try it again with -t1. The one I used to test is the Serenity trailer from
http://www.h264info.com/clips.html.
I'm using IPP 7.0.6 with the 7.0.6 sample code. I'm on a quad core i7 and with threads set to 0 (internally this would be 8) my CPU sits at roughly 20% the entirety of the clip. This is nearly 2 of my CPUs fully maxed (including hyperthreading). Now if I take the same clip and play it with threads set to 1 then it sits at 1% to 2% CPU with some rare spikes up to 5%. I would expect slightly higher CPU with threading, but nothing more than an extra 1-2%.
To make matters worse if I try to enable threading in our application which decodes and displays several different cameras at a time (up to 48) then if enough cameras are being displayed it will sit consistently at 99% CPU and make the entire PC unusable until you close the application. In IPP 6.1.6 this worked fine. Granted, if I force the threads to be limited to 2 then everything is much more usable and it actually performs better than 6.1.6 with threads set to 0. So if it weren't for things going crazy with 8 threads then it would actually be a nice performance bump.
Also has there ever been any thought in implementing a thread pool feature so that you could allow decoders to share threads for situations like ours? That way our potential 48 H264VideoDecoders could share just 8 threads rather than creating 384 threads.