I do miss documentation of these implementation modes for each function that has it.
So, if IPP switches from single threading to multithreading when you expand 1600 to 1601, and you then see 100% cpu increase, I would say your code is not fully correct for that.
Of course, I don't know how you multithread, if you multithread.
You can have your app have its own multihthreading, but then you should let IPP do singlethreading.
If your app has no multthreading, you should let IPP do the multithreading by calling SetNumThreads(NumCPUs_That_Are_Not_used_in_Other_Threaded_Code), and by using the properly multithreaded IPP libraries, using OpenMP for instance.
If you have an Dual Core HT-cpu (hyper-threaded), then I guess you should use 2 threads, since the two extra HT threads are not at full cpu.
My point is, if you over-thread your app, performance will suffer.
Over-threading is when you tell you code to use more threads than your cpu can process at 100%; so a 2 core HT should run 2 threads. A 2 core non-HT should use 2 threads. A 4 core non-HT should use 4 threads. A 4 core HT should use 4 threads. Let the slower HT threads be used by the OS or the UI.
yes, we don't document all internal criterions - they are specific for each architecture - for example this particular one is the next:
#define STRT_OMP_DIR_R 1600
#define STRT_OMP_FFT_R 1600
#define STRT_OMP_DIR_C 800
#define STRT_OMP_FFT_C 800
#define STRT_OMP_FFT_C 800
so you see - there is one more implementation - FIR via FFT and different criterion for HT - we can't overload documentation with all this stuff...
100% CPU load isan issue of OMP version used -try to set the blocktime at the beginning of the application via either environment variable or API call, e.g.
this should decrease CPU usage. There is no oversubscription - nested threading is disabled by default.
static extern void kmp_set_blocktime(int value);
causes an error:
A call to PInvoke function 'IppsFIR64f_32f_Test!IppsFIR64f_32f_Test.MainForm::kmp_set_blocktime' has unbalanced the stack. This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature.
Please could you help us to correct this code?
First of all we need to understand that the issue is really connected with blocktime - so could you try to set the environment variable
- if it solves your issue - then we can think on how to call OMP runtime functions from C#
# if defined(_WIN32)
# define __KAI_KMPC_CONVENTION __cdecl
# define __KAI_KMPC_CONVENTION
# endifextern void __KAI_KMPC_CONVENTION kmp_set_blocktime (int);
extern void __KAI_KMPC_CONVENTION kmp_set_defaults (char const *);
I cringe whenever I look at IPP documentation which looks machine generated and which always presumes that those who use IPP must know everything on the particular subject.
agree, documentation should be improved (it's not "machine generated") - it's one of the main goals for the nearest fututre releases. Anyway almost all functionality/algorithms used in IPP are compatible with Matlab - so our documentation provides enough info on functions parameters and return statuses, and you always can pick up additional information on DSP or Image processing from the web, wikipedia, Matlab help, etc. - so IPP manuals are not primer textbooks - they are technical manuals.
- you see that at least 3 algorithms are used for single thread - and they have complex criterions based on tapsLen, vector length, data type used and Intel architecture (SSE2, SSE3, SSSE3, SSE4.1, AVX, etc.). For multi-thread these criterions are extended with one more. These criterions are IPP internals and can be changed from release to release based on current performance data - they are not a subject that should be or can be documented. We state in the documentation that dynamic libraries are threaded and provide a list of threaded functions. I guess that it's evident that each threaded function has internal criterions based on parameters when to use single threaded code and when - multi-threaded (threading always introduces some overhead - so it provides visible benefit only for some amount of work - below such criterion you'll see significant slowdown that is not permissible for perf libraries). So every threaded IPP function has such internal criterion that is different for each supported architecture. If you don't want to see any "unpredictable" algorithm switching - use single threaded static and external threading, please. Currently we are considering full removal of OMP code from IPP functions - threading at the primitive level is not so efficient as at the aplication level - DMIP sample proves this statement on 200%.