I would recommend you to check IPP functions you use in your application with IPP ThreadedFunctionList.txt file available in IPP distribution. No all IPP functions are threaded (for example, you probably would not expect threading benefits for 3x3 matrix add operation, is not it?).
It is not clear what do you mean under time penalty for initial call to IPP. How do you measure that? Might be you just