<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic ippsFIR64f_32f unusual processor usage  in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831372#M5572</link>
    <description>Thomas,&lt;BR /&gt;&lt;BR /&gt;yes, we don't document all internal criterions - they are specific for each architecture - for example this particular one is the next:&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;#ifdef _OPENMP&lt;/P&gt;&lt;P&gt;#include &lt;OMP.H&gt;&lt;/OMP.H&gt;&lt;/P&gt;&lt;P&gt;#define STRT_OMP_DIR_R 1600&lt;/P&gt;&lt;P&gt;#define STRT_OMP_FFT_R 1600&lt;/P&gt;&lt;P&gt;#define STRT_OMP_DIR_C 800&lt;/P&gt;&lt;P&gt;#ifdef FIR_OPT_HT&lt;/P&gt;&lt;P&gt;#define STRT_OMP_FFT_C 800&lt;/P&gt;&lt;P&gt;#else&lt;/P&gt;&lt;P&gt;#define STRT_OMP_FFT_C 800&lt;/P&gt;&lt;P&gt;#endif&lt;/P&gt;&lt;P&gt;#endif&lt;BR /&gt;&lt;BR /&gt;so you see - there is one more implementation - FIR via FFT and different criterion for HT - we can't overload documentation with all this stuff...&lt;BR /&gt;&lt;BR /&gt;100% CPU load isan issue of OMP version used -try to set the blocktime at the beginning of the application via either environment variable or API call, e.g.&lt;/P&gt;&lt;P&gt;set KMP_BLOCKTIME=200&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;kmp_set_defaults("KMP_BLOCKTIME=200");&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;kmp_set_blocktime(200);&lt;BR /&gt;&lt;BR /&gt;this should decrease CPU usage. There is no oversubscription - nested threading is disabled by default.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Igor&lt;/P&gt;</description>
    <pubDate>Sun, 15 Jan 2012 14:46:14 GMT</pubDate>
    <dc:creator>igorastakhov</dc:creator>
    <dc:date>2012-01-15T14:46:14Z</dc:date>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831366#M5566</link>
      <description>&lt;P&gt;We saw an issue in method ippsFIR64f_32f()
(IPP 6.0.1.070). We use this method for real-time data filtering. We saw that
after changing data length from 1600 samples to 1601 samples computer processor
usage changes from 2% to near 100%. Whole test project for Visual Studio 2010 is attached.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jan 2012 07:38:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831366#M5566</guid>
      <dc:creator>s_smirnov</dc:creator>
      <dc:date>2012-01-13T07:38:43Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831367#M5567</link>
      <description>You allocate taps in C# code, but I think you should use IPP allocation functions for all or most of array inputs to IPP functions, to ensure proper memory aligment, and thus ensuring highest processing speed.&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Jan 2012 09:06:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831367#M5567</guid>
      <dc:creator>Thomas_Jensen1</dc:creator>
      <dc:date>2012-01-13T09:06:33Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831368#M5568</link>
      <description>According to your suggestion we made all memory allocation via IPP methods, 
but without success. You can check attached Visual Studio 
project. Please see that in these two cases we use the same memory block, only one parameter (block lengh) is changed&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Jan 2012 11:07:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831368#M5568</guid>
      <dc:creator>s_smirnov</dc:creator>
      <dc:date>2012-01-13T11:07:02Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831369#M5569</link>
      <description>1600 is a criterion for switching from ST to MT code - if you don't see any benefit from multithreading - you should link with non-threaded static library or setIPP num threads to 1.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Igor</description>
      <pubDate>Fri, 13 Jan 2012 13:33:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831369#M5569</guid>
      <dc:creator>igorastakhov</dc:creator>
      <dc:date>2012-01-13T13:33:36Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831370#M5570</link>
      <description>I have also seen that certain IPP functions switch implementation modes when parameters exceed certain limits; this is smart coding in my opinion.&lt;BR /&gt;&lt;BR /&gt;I do miss documentation of these implementation modes for each function that has it.&lt;BR /&gt;</description>
      <pubDate>Sat, 14 Jan 2012 19:46:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831370#M5570</guid>
      <dc:creator>Thomas_Jensen1</dc:creator>
      <dc:date>2012-01-14T19:46:56Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831371#M5571</link>
      <description>I guess the memory aligment was not the reason it was slow.&lt;BR /&gt;&lt;BR /&gt;So, if IPP switches from single threading to multithreading when you expand 1600 to 1601, and you then see 100% cpu increase, I would say your code is not fully correct for that.&lt;BR /&gt;&lt;BR /&gt;Of course, I don't know how you multithread, if you multithread.&lt;BR /&gt;&lt;BR /&gt;You can have your app have its own multihthreading, but then you should let IPP do singlethreading.&lt;BR /&gt;If your app has no multthreading, you should let IPP do the multithreading by calling SetNumThreads(NumCPUs_That_Are_Not_used_in_Other_Threaded_Code), and by using the properly multithreaded IPP libraries, using OpenMP for instance.&lt;BR /&gt;&lt;BR /&gt;If you have an Dual Core HT-cpu (hyper-threaded), then I guess you should use 2 threads, since the two extra HT threads are not at full cpu.&lt;BR /&gt;&lt;BR /&gt;My point is, if you over-thread your app, performance will suffer.&lt;BR /&gt;Over-threading is when you tell you code to use more threads than your cpu can process at 100%; so a 2 core HT should run 2 threads. A 2 core non-HT should use 2 threads. A 4 core non-HT should use 4 threads. A 4 core HT should use 4 threads. Let the slower HT threads be used by the OS or the UI.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Sat, 14 Jan 2012 20:00:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831371#M5571</guid>
      <dc:creator>Thomas_Jensen1</dc:creator>
      <dc:date>2012-01-14T20:00:09Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831372#M5572</link>
      <description>Thomas,&lt;BR /&gt;&lt;BR /&gt;yes, we don't document all internal criterions - they are specific for each architecture - for example this particular one is the next:&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;#ifdef _OPENMP&lt;/P&gt;&lt;P&gt;#include &lt;OMP.H&gt;&lt;/OMP.H&gt;&lt;/P&gt;&lt;P&gt;#define STRT_OMP_DIR_R 1600&lt;/P&gt;&lt;P&gt;#define STRT_OMP_FFT_R 1600&lt;/P&gt;&lt;P&gt;#define STRT_OMP_DIR_C 800&lt;/P&gt;&lt;P&gt;#ifdef FIR_OPT_HT&lt;/P&gt;&lt;P&gt;#define STRT_OMP_FFT_C 800&lt;/P&gt;&lt;P&gt;#else&lt;/P&gt;&lt;P&gt;#define STRT_OMP_FFT_C 800&lt;/P&gt;&lt;P&gt;#endif&lt;/P&gt;&lt;P&gt;#endif&lt;BR /&gt;&lt;BR /&gt;so you see - there is one more implementation - FIR via FFT and different criterion for HT - we can't overload documentation with all this stuff...&lt;BR /&gt;&lt;BR /&gt;100% CPU load isan issue of OMP version used -try to set the blocktime at the beginning of the application via either environment variable or API call, e.g.&lt;/P&gt;&lt;P&gt;set KMP_BLOCKTIME=200&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;kmp_set_defaults("KMP_BLOCKTIME=200");&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;kmp_set_blocktime(200);&lt;BR /&gt;&lt;BR /&gt;this should decrease CPU usage. There is no oversubscription - nested threading is disabled by default.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Igor&lt;/P&gt;</description>
      <pubDate>Sun, 15 Jan 2012 14:46:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831372#M5572</guid>
      <dc:creator>igorastakhov</dc:creator>
      <dc:date>2012-01-15T14:46:14Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831373#M5573</link>
      <description>Unfortunately we could not call method kmp_set_blocktime() in C#.
&lt;BR /&gt;
&lt;BR /&gt;This code:
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;[DllImport("libiomp5md.dll")]
&lt;BR /&gt;
&lt;BR /&gt;static extern void kmp_set_blocktime(int value);
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;kmp_set_blocktime(200);
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;causes an error:
&lt;BR /&gt;
&lt;BR /&gt;A call to PInvoke function 
'IppsFIR64f_32f_Test!IppsFIR64f_32f_Test.MainForm::kmp_set_blocktime' has 
unbalanced the stack. This is likely because the managed PInvoke signature 
does not match the unmanaged target signature. Check that the calling 
convention and parameters of the PInvoke signature match the target 
unmanaged signature.
&lt;BR /&gt;
&lt;BR /&gt;Please could you help us to correct this code?</description>
      <pubDate>Mon, 16 Jan 2012 09:12:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831373#M5573</guid>
      <dc:creator>s_smirnov</dc:creator>
      <dc:date>2012-01-16T09:12:05Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831374#M5574</link>
      <description>&lt;P&gt;First of all we need to understand that the issue is really connected with blocktime - so could you try to set the environment variable&lt;BR /&gt;&lt;BR /&gt;set KMP_BLOCKTIME=200&lt;BR /&gt;&lt;BR /&gt;- if it solves your issue - then we can think on how to call OMP runtime functions from C#&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Igor&lt;/P&gt;</description>
      <pubDate>Mon, 16 Jan 2012 09:21:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831374#M5574</guid>
      <dc:creator>igorastakhov</dc:creator>
      <dc:date>2012-01-16T09:21:10Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831375#M5575</link>
      <description>Thank you for the information, we made test with environment variable and 
value "KMP_BLOCKTIME=0" solved CPU usage problem. Now we are trying to find 
a way to set this variable from C# code.</description>
      <pubDate>Mon, 16 Jan 2012 09:55:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831375#M5575</guid>
      <dc:creator>s_smirnov</dc:creator>
      <dc:date>2012-01-16T09:55:50Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831376#M5576</link>
      <description>"kmp" functions are OMP functions so I guess you need their prototipes or "omp.h" file for Intel OMP realisation (libguideXXX.dll).&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;# if defined(_WIN32)&lt;/P&gt;&lt;P&gt;# define __KAI_KMPC_CONVENTION __cdecl&lt;/P&gt;&lt;P&gt;# else&lt;/P&gt;&lt;P&gt;# define __KAI_KMPC_CONVENTION&lt;/P&gt;&lt;P&gt;# endif&lt;/P&gt;extern void __KAI_KMPC_CONVENTION kmp_set_blocktime (int);&lt;BR /&gt;extern void __KAI_KMPC_CONVENTION kmp_set_defaults (char const *);&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Igor&lt;BR /&gt;</description>
      <pubDate>Mon, 16 Jan 2012 12:33:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831376#M5576</guid>
      <dc:creator>igorastakhov</dc:creator>
      <dc:date>2012-01-16T12:33:08Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831377#M5577</link>
      <description>Igor, I disagree about not having such dynamic behavior documented. If it is unpredictable and if it can cause issues and headache for developers (as it turned out here), then &lt;B&gt;it has to be documented&lt;/B&gt;.&lt;BR /&gt;&lt;BR /&gt;I cringe whenever I look at IPP documentation which looks machine generated and which always presumes that those who use IPP must know everything on the particular subject.&lt;BR /&gt;</description>
      <pubDate>Tue, 17 Jan 2012 23:50:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831377#M5577</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2012-01-17T23:50:27Z</dc:date>
    </item>
    <item>
      <title>ippsFIR64f_32f unusual processor usage</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831378#M5578</link>
      <description>Hi Igor,&lt;BR /&gt;&lt;BR /&gt;agree, documentation should be improved (it's not "machine generated") - it's one of the main goals for the nearest fututre releases. Anyway almost all functionality/algorithms used in IPP are compatible with Matlab - so our documentation provides enough info on functions parameters and return statuses, and you always can pick up additional information on DSP or Image processing from the web, wikipedia, Matlab help, etc. - so IPP manuals are not primer textbooks - they are technical manuals.&lt;BR /&gt;&lt;BR /&gt;Regarding FIR:&lt;IMG height="601" width="894" src="http://software.intel.com/file/41085" alt="FIR" /&gt;&lt;BR /&gt;- you see that at least 3 algorithms are used for single thread - and they have complex criterions based on tapsLen, vector length, data type used and Intel architecture (SSE2, SSE3, SSSE3, SSE4.1, AVX, etc.). For multi-thread these criterions are extended with one more. These criterions are IPP internals and can be changed from release to release based on current performance data - they are not a subject that should be or can be documented. We state in the documentation that dynamic libraries are threaded and provide a list of threaded functions. I guess that it's evident that each threaded function has internal criterions based on parameters when to use single threaded code and when - multi-threaded (threading always introduces some overhead - so it provides visible benefit only for some amount of work - below such criterion you'll see significant slowdown that is not permissible for perf libraries). So every threaded IPP function has such internal criterion that is different for each supported architecture. If you don't want to see any "unpredictable" algorithm switching - use single threaded static and external threading, please. Currently we are considering full removal of OMP code from IPP functions - threading at the primitive level is not so efficient as at the aplication level - DMIP sample proves this statement on 200%.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Igor</description>
      <pubDate>Wed, 18 Jan 2012 07:22:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippsFIR64f-32f-unusual-processor-usage/m-p/831378#M5578</guid>
      <dc:creator>igorastakhov</dc:creator>
      <dc:date>2012-01-18T07:22:01Z</dc:date>
    </item>
  </channel>
</rss>

