Confusion with ippInit() and ippmerged.c

Thomas_Jensen1 · ‎03-08-2010

I'm compiling a custom IPP DLL using ippmerged.c, with just the subset I need, and also with a subset of all cpu library types.

However, since I just found a problem in which I had to explicitly call ippSetNumThreads() to get more than one thread, although the documentation states that it by default will use all available cores, then I was looking at the initialization code in my DLL.

The documentation states that I must call ippInit() or ippInitCpu(), however, with ippmerged.c, those functions are not called.

Specifically, is there a function t7_ippInit() ?

Using ippmerged.c, all IPP functions are routed to the cpu library specific code, by calling its function InitStatic() or InitStaticUseCpu(), and those functions only select the proper "addressbook", and it does not call any ippInit routines.

In ippcore.h, ippInit() will do "Automatic switching to best for current cpu library code using.".

So, it is somewhat confusing if ippInit must be called or not.

Chao_Y_Intel · ‎03-09-2010

Hello,

It looks that you are looking at the IPP sample code \advanced-usage\linkage\mergedlib, and also you only want to use part of the CPU optimized code ( for example, only includes P8, V8, no w7, t7 code).

If so, you does NOT need to call ippInit()/ippInitCpu(), ippInit()/ippInitCpu will automatically detect the CPU feature, dispatch related CPU code for that CPU.

If you only select part of CPU optimized code, you need to write your owner CPU dispatching code. It is like InitStatic() function in ippmerged.c functions. You only need to include the CPU optimized code used in your application.

Note that InitStatic() in ippmerged.c is similar with ippInit(). It also detects the CPU feature, and find the correct code. So that sample does not call ippInit(), and use InitStatic() as the dispatching function.

For default number of threading, it will set to all available cores if ippInit() is called. If this function is not called, you need to explicitly call ippSetNumThreads().

This article has some more discussion on ippInit() function:

http://software.intel.com/en-us/articles/ipp-dispatcher-control-functions-ippinit-functions/

Thanks,

Chao

Thomas_Jensen1 · ‎03-10-2010

To conclude, it then seems that the sample ippmerged.c is missing calling ippiSetNumThreads(GetMaxThreads).

Chao_Y_Intel · ‎03-10-2010

\linkage\mergedlib sample uses non-threaded static library. Threading is not supported for these libraries. The libraries it used can be found at Makefile:

ippMergedLibs="$(IPPROOT)\lib\ipp*merged.lib" "$(IPPROOT)\lib\ippcorel.lib"

It is true that it needs to add ippiSetNumThreads() if it link with threaded static libraries.

some *merged.lib need to changed to *merged_t.lib

Thanks,

Chao

Thomas_Jensen1 · ‎03-11-2010

I already did modify the ippmerged sample to link with threaded librearies.

I also see that with the execution of ipp code, as it uses all four cpus on a quad-core.

So, to summarize, when using threaded IPP library code, and using sample code using OMP, then one must take care to call ippiSetNumThreads(MaxThreads) and set_num_threads(MaxThreads) (for OMP).

I then have just one more question regarding this. Does IPP have a function to determine MaxThreads, assuming I would want to utilize all cores (or up to a certain maximum).

I did try ippGetNumCoresOnDie(), but that return bad values when on AMD quad-core (always 1), and good values when on Xeon quad-core (4). It is interesting that the OMP get_num_threads() does work properly for both AMD and Xeon.

Ying_S_Intel · ‎04-12-2010

Quoting Thomas Jensen

I already did modify the ippmerged sample to link with threaded librearies.
I also see that with the execution of ipp code, as it uses all four cpus on a quad-core.
So, to summarize, when using threaded IPP library code, and using sample code using OMP, then one must take care to call ippiSetNumThreads(MaxThreads) and set_num_threads(MaxThreads) (for OMP).
I then have just one more question regarding this. Does IPP have a function to determine MaxThreads, assuming I would want to utilize all cores (or up to a certain maximum).
I did try ippGetNumCoresOnDie(), but that return bad values when on AMD quad-core (always 1), and good values when on Xeon quad-core (4). It is interesting that the OMP get_num_threads() does work properly for both AMD and Xeon.

Regard the returned results of calling ippGetNumCoreOnDie() , the Intel IPP 6.1 update 5 fixed this issue for multi-core processors.

Thanks,
Ying

matthieu_darbois · ‎04-13-2010

Hi,
Be carefull when using ippGetNumCoresOnDie(). It doesn't return the maximum number of threads when on a multi-processor system and/or a hyperthreaded system. As its name implies, it returns the number of cores on one die which can be quite different.

Regards,
Matthieu

Thomas_Jensen1 · ‎04-13-2010

I would say it will return the number of full-speed threads.
If you have an Intel 4-core HT-enabled CPU, it can run 8 threads, but not all at full speed.
4 at full speed, 4 at somewhere between 0 and 20% I think, depending on the code.

So, leaving approx. 1.0 full speed core for the UI, i would say that "4" is a good number of threads to use in IPP.
For non-HT CPUs, "4" - 1 = 3 would be better.

This would apply to full-load IPP code over several seconds.

If the IPP-load is very intermittent (0.2 seconds max), then 4 would be good in both cases.

But to create best-case code, we'd need a new function "ippGetNumThreadsOnDie".