I'm compiling a custom IPP DLL using ippmerged.c, with just the subset I need, and also with a subset of all cpu library types.
However, since I just found a problem in which I had to explicitly call ippSetNumThreads() to get more than one thread, although the documentation states that it by default will use all available cores, then I was looking at the initialization code in my DLL.
The documentation states that I must call ippInit() or ippInitCpu(), however, with ippmerged.c, those functions are not called.
Specifically, is there a function t7_ippInit() ?
Using ippmerged.c, all IPP functions are routed to the cpu library specific code, by calling its function InitStatic() or InitStaticUseCpu(), and those functions only select the proper "addressbook", and it does not call any ippInit routines.
In ippcore.h, ippInit() will do "Automatic switching to best for current cpu library code using.".
So, it is somewhat confusing if ippInit must be called or not.
It looks that you are looking at the IPP sample code \advanced-usage\linkage\mergedlib, and also you only want to use part of the CPU optimized code ( for example, only includes P8, V8, no w7, t7 code).
If so, you does NOT need to call ippInit()/ippInitCpu(), ippInit()/ippInitCpu will automatically detect the CPU feature, dispatch related CPU code for that CPU.
If you only select part of CPU optimized code, you need to write your owner CPU dispatching code. It is like InitStatic() function in ippmerged.c functions. You only need to include the CPU optimized code used in your application.
Note that InitStatic() in ippmerged.c is similar with ippInit(). It also detects the CPU feature, and find the correct code. So that sample does not call ippInit(), and use InitStatic() as the dispatching function.
For default number of threading, it will set to all available cores if ippInit() is called. If this function is not called, you need to explicitly call ippSetNumThreads().
This article has some more discussion on ippInit() function:
\linkage\mergedlib sample uses non-threaded static library. Threading is not supported for these libraries. The libraries it used can be found at Makefile:
It is true that it needs to add ippiSetNumThreads() if it link with threaded static libraries.
some *merged.lib need to changed to *merged_t.lib
I already did modify the ippmerged sample to link with threaded librearies.
I also see that with the execution of ipp code, as it uses all four cpus on a quad-core.
So, to summarize, when using threaded IPP library code, and using sample code using OMP, then one must take care to call ippiSetNumThreads(MaxThreads) and set_num_threads(MaxThreads) (for OMP).
I then have just one more question regarding this. Does IPP have a function to determine MaxThreads, assuming I would want to utilize all cores (or up to a certain maximum).
I did try ippGetNumCoresOnDie(), but that return bad values when on AMD quad-core (always 1), and good values when on Xeon quad-core (4). It is interesting that the OMP get_num_threads() does work properly for both AMD and Xeon.
Be carefull when using ippGetNumCoresOnDie(). It doesn't return the maximum number of threads when on a multi-processor system and/or a hyperthreaded system. As its name implies, it returns the number of cores on one die which can be quite different.
If you have an Intel 4-core HT-enabled CPU, it can run 8 threads, but not all at full speed.
4 at full speed, 4 at somewhere between 0 and 20% I think, depending on the code.
So, leaving approx. 1.0 full speed core for the UI, i would say that "4" is a good number of threads to use in IPP.
For non-HT CPUs, "4" - 1 = 3 would be better.
This would apply to full-load IPP code over several seconds.
If the IPP-load is very intermittent (0.2 seconds max), then 4 would be good in both cases.
But to create best-case code, we'd need a new function "ippGetNumThreadsOnDie".