Thanks for sharing. The default number of OpenMP threads used by the threaded Intel IPPis equal to the number of hardware threads in the system.In general, you don't need to set it manually.
But HT may not always benefit for all IPP functions, please see more http://software.intel.com/en-us/articles/openmp-and-the-intel-ipp-library/
so may you try disableHT andlet us to knowthe performanceresult, then decide to disable it or enable it or set thread=1 as the IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems
I would recommend you to check IPP functions you use in your application with IPP ThreadedFunctionList.txt file available in IPP distribution. No all IPP functions are threaded (for example, you probably would not expect threading benefits for 3x3 matrix add operation, is not it?).
It is not clear what do you mean under time penalty for initial call to IPP. How do you measure that? Might be you just
Don't worry too much about time penalties due to dispatching. This penalty is overstatedby the manual and for most applications can be safely ignored. The difference between using a dispatched library (the default for both dynamic and static) and building a special configuration of the library that is specific to your processor just to eliminate the dispatch overhead is usually not worth the effort. If you need to save space in your application, and you can guarantee you will only run on one processor architecture, then building a processor-specific version of the library may be worthwhile, but otherwise it will not be worth the time and effort.