- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We saw an issue in method ippsFIR64f_32f() (IPP 6.0.1.070). We use this method for real-time data filtering. We saw that after changing data length from 1600 samples to 1601 samples computer processor usage changes from 2% to near 100%. Whole test project for Visual Studio 2010 is attached.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do miss documentation of these implementation modes for each function that has it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, if IPP switches from single threading to multithreading when you expand 1600 to 1601, and you then see 100% cpu increase, I would say your code is not fully correct for that.
Of course, I don't know how you multithread, if you multithread.
You can have your app have its own multihthreading, but then you should let IPP do singlethreading.
If your app has no multthreading, you should let IPP do the multithreading by calling SetNumThreads(NumCPUs_That_Are_Not_used_in_Other_Threaded_Code), and by using the properly multithreaded IPP libraries, using OpenMP for instance.
If you have an Dual Core HT-cpu (hyper-threaded), then I guess you should use 2 threads, since the two extra HT threads are not at full cpu.
My point is, if you over-thread your app, performance will suffer.
Over-threading is when you tell you code to use more threads than your cpu can process at 100%; so a 2 core HT should run 2 threads. A 2 core non-HT should use 2 threads. A 4 core non-HT should use 4 threads. A 4 core HT should use 4 threads. Let the slower HT threads be used by the OS or the UI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, we don't document all internal criterions - they are specific for each architecture - for example this particular one is the next:
#ifdef _OPENMP
#include
#define STRT_OMP_DIR_R 1600
#define STRT_OMP_FFT_R 1600
#define STRT_OMP_DIR_C 800
#ifdef FIR_OPT_HT
#define STRT_OMP_FFT_C 800
#else
#define STRT_OMP_FFT_C 800
#endif
#endif
so you see - there is one more implementation - FIR via FFT and different criterion for HT - we can't overload documentation with all this stuff...
100% CPU load isan issue of OMP version used -try to set the blocktime at the beginning of the application via either environment variable or API call, e.g.
set KMP_BLOCKTIME=200
or
kmp_set_defaults("KMP_BLOCKTIME=200");
or
kmp_set_blocktime(200);
this should decrease CPU usage. There is no oversubscription - nested threading is disabled by default.
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This code:
[DllImport("libiomp5md.dll")]
static extern void kmp_set_blocktime(int value);
kmp_set_blocktime(200);
causes an error:
A call to PInvoke function 'IppsFIR64f_32f_Test!IppsFIR64f_32f_Test.MainForm::kmp_set_blocktime' has unbalanced the stack. This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature.
Please could you help us to correct this code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all we need to understand that the issue is really connected with blocktime - so could you try to set the environment variable
set KMP_BLOCKTIME=200
- if it solves your issue - then we can think on how to call OMP runtime functions from C#
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
# if defined(_WIN32)
# define __KAI_KMPC_CONVENTION __cdecl
# else
# define __KAI_KMPC_CONVENTION
# endif
extern void __KAI_KMPC_CONVENTION kmp_set_blocktime (int);extern void __KAI_KMPC_CONVENTION kmp_set_defaults (char const *);
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cringe whenever I look at IPP documentation which looks machine generated and which always presumes that those who use IPP must know everything on the particular subject.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
agree, documentation should be improved (it's not "machine generated") - it's one of the main goals for the nearest fututre releases. Anyway almost all functionality/algorithms used in IPP are compatible with Matlab - so our documentation provides enough info on functions parameters and return statuses, and you always can pick up additional information on DSP or Image processing from the web, wikipedia, Matlab help, etc. - so IPP manuals are not primer textbooks - they are technical manuals.
Regarding FIR:
- you see that at least 3 algorithms are used for single thread - and they have complex criterions based on tapsLen, vector length, data type used and Intel architecture (SSE2, SSE3, SSSE3, SSE4.1, AVX, etc.). For multi-thread these criterions are extended with one more. These criterions are IPP internals and can be changed from release to release based on current performance data - they are not a subject that should be or can be documented. We state in the documentation that dynamic libraries are threaded and provide a list of threaded functions. I guess that it's evident that each threaded function has internal criterions based on parameters when to use single threaded code and when - multi-threaded (threading always introduces some overhead - so it provides visible benefit only for some amount of work - below such criterion you'll see significant slowdown that is not permissible for perf libraries). So every threaded IPP function has such internal criterion that is different for each supported architecture. If you don't want to see any "unpredictable" algorithm switching - use single threaded static and external threading, please. Currently we are considering full removal of OMP code from IPP functions - threading at the primitive level is not so efficient as at the aplication level - DMIP sample proves this statement on 200%.
Regards,
Igor
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page