- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi staff,
I'm using the old Intel Parallel PXE 2011. I have a question about dynamic linking and dispatching.
I know that dynamic linking is default with dispatching enabled. Is there any option/function call to disable it?
I want this because i want to compare different data between static linking with no dispatching and dynamic linking (with dispatching disable).
Hope anyone can help me.
Best regards,
Tam.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Now it's clear. The fact is that in your cases (1) and (2) different CPU codes work.
In case (1), when there is no initialization with ippInit(), the computations are done using scalar FPU (floating point unit). It's a 80-bit precision device. It is relatively slow, but precise. It is so-called "px/mx" code. Look: https://software.intel.com/en-us/articles/ipp-dispatcher-control-functions-ippinit-functions
When you call ippInit without particular CPU, like in the function ippInitCpu(cpucode), the most appropriate optimized function variant is chosen. It is done automatically in IPP dynamic libraries. With your CPU this code is kind of SSEx clone (SIMD). If I'm not wrong, for FP calculations SSE precision in 64 bits. SSE is fast, but less precise.
By the way, you can the same results as in (1) in your (2) case, if, at the beginning, you call ippInitCpu with CPU-type argument for Intel(R) Pentium/II/III processors (ippCpuUnknown, or ippCpuPP, or ippCpuPII, or ippCpuPIII).
The question is "is 10^-6 enough for you and you may want more performing functionality?". The additional problem may come from the fact, that chosen CPU optimization works throughout the whole library. If you limit your optimization to px/mx, it means, that the limitation will affect all other IPP functions, which may not suffer from FPU/SSE difference. And you may lose performance, where it really necessary.
As far as I know, Sqrt functionality was redeveloped for better precision in IPP 9.0.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tam,
In the existing IPP environment, I think it is impossible. CPU dispatching is inherent part of dynamic library.
However, from technical point of view, there is almost no difference between static and dynamic libs. Calling of a dispatched function differs from calling of statically linked function by a few CPU clocks. Only a first call of an IPP function starts CPU dispatcher, which initializes function pointers according to CPU characteristics, or according to selected CPU features, if you want to execute a particular CPU optimization on a higher CPU (for example, SSE4.2 code on AVX CPU). This first step may take (and, actually takes) longer also because with dynamic linking it leads to bunch of DLLs loading.
After you have done IPP initialization (CPU dispatching initialization), dynamic function calls are as fast as static function calls. Or, all this is not about performance?
Please, tell us what particular experiments you want to execute, and may be we could find the other ways to implement this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
I'm wondering if it is possible to use dynamic library linking + not use dispatching.
My circumstance is: the old project used static library linked without calling to ippStaticInit/ippInit. (static linking without dispatching).
Now, I want to use dynamic library linking. But there is the problem that dynamic library linking produces different output as compared to the old project because it uses dispatching.
I really don't care about the performance, i'm thinking that program produces the same output for any CPU type when dispatching is disable. Is this true?
I'm wondering if I can produce results as same as old project by using dynamic library linking? By turn off dispatching? Is it possible?
Regards,
Tam.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Different output ? This is a problem. It should be the same.
Could you please give more info: IPP function you suspect in wrong result, IPP library version with correct output (static linking) and IPP library version with incorrect output (dynamic linking)? Or, you speak about the same library? Please, specify version. And your current CPU model, please.
Here, we must speak not about difference in static or dynamic linking, but about possible discrepancy of results of different CPU-optimized implementations. It is possible to try different optimizations in both static and dynamic cases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergrey,
Here is information:
IPP Function ippsSqrt_32fc_I.
IPP library version : IPP PXE 2011.
My CPU model is Intel core i3-3240.
1/ static library ippm_l.lib ipps_l.lib ippi_l.lib ippvm_l.lib ippcore_l.lib and not call to ippInit() function in sourcecode
2/ dynamic library ippm.lib ipps.lib ippi.lib ippvm.lib ippcore.lib
(1) and (2) not give the identical result ( epsilon 10^-6).
When use (1) + ippInit() then (1) and (2) is same.
What i want is (2) give the identical result as (1). How could it be?
Regards,
Tam
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Now it's clear. The fact is that in your cases (1) and (2) different CPU codes work.
In case (1), when there is no initialization with ippInit(), the computations are done using scalar FPU (floating point unit). It's a 80-bit precision device. It is relatively slow, but precise. It is so-called "px/mx" code. Look: https://software.intel.com/en-us/articles/ipp-dispatcher-control-functions-ippinit-functions
When you call ippInit without particular CPU, like in the function ippInitCpu(cpucode), the most appropriate optimized function variant is chosen. It is done automatically in IPP dynamic libraries. With your CPU this code is kind of SSEx clone (SIMD). If I'm not wrong, for FP calculations SSE precision in 64 bits. SSE is fast, but less precise.
By the way, you can the same results as in (1) in your (2) case, if, at the beginning, you call ippInitCpu with CPU-type argument for Intel(R) Pentium/II/III processors (ippCpuUnknown, or ippCpuPP, or ippCpuPII, or ippCpuPIII).
The question is "is 10^-6 enough for you and you may want more performing functionality?". The additional problem may come from the fact, that chosen CPU optimization works throughout the whole library. If you limit your optimization to px/mx, it means, that the limitation will affect all other IPP functions, which may not suffer from FPU/SSE difference. And you may lose performance, where it really necessary.
As far as I know, Sqrt functionality was redeveloped for better precision in IPP 9.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
Thank you alot, The information was almost there, but I've not read it carefully. :(.
Best regards,
Tam
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
"By the way, you can the same results as in (1) in your (2) case, if, at the beginning, you call ippInitCpu with CPU-type argument for Intel(R) Pentium/II/III processors (ippCpuUnknown, or ippCpuPP, or ippCpuPII, or ippCpuPIII)."
I tried (2) with ippCpuUnknown, it seems not to affect the output result at all. I also tried with ippCpuPP, but it didn't give the same output as (1).
I don't know why? How do you think about it. Please not I'm using IPP PXE 2011 (7.0) version. The link you gave me is 6.1
Hope you can help me to solve this.
Regards,
Tam.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have tried all CPU types.I used (2) + call ippInitCPU(). Here are results:
CPU TYPE |
Result |
ippCpuUnknown = 0x00 |
Dynamic |
ippCpuPP = 0x01, /* Intel(R) Pentium(R) processor */ |
X |
ippCpuPMX = 0x02, /* Pentium(R) processor with MMX(TM) technology */ |
X |
ippCpuPPR = 0x03, /* Pentium(R) Pro processor */ |
X |
ippCpuPII = 0x04, /* Pentium(R) II processor */ |
X |
ippCpuPIII = 0x05, /* Pentium(R) III processor and Pentium(R) III Xeon(R) processor */ |
X |
ippCpuP4 = 0x06, /* Pentium(R) 4 processor and Intel(R) Xeon(R) processor */ |
X |
ippCpuP4HT = 0x07, /* Pentium(R) 4 Processor with HT Technology */ |
X |
ippCpuP4HT2 = 0x08, /* Pentium(R) 4 processor with Streaming SIMD Extensions 3 */ |
X |
ippCpuCentrino = 0x09, /* Intel(R) Centrino(TM) mobile technology */ |
X |
ippCpuCoreSolo = 0x0a, /* Intel(R) Core(TM) Solo processor */ |
X |
ippCpuCoreDuo = 0x0b, /* Intel(R) Core(TM) Duo processor */ |
X |
ippCpuITP = 0x10, /* Intel(R) Itanium(R) processor */ |
X |
ippCpuITP2 = 0x11, /* Intel(R) Itanium(R) 2 processor */ |
X |
ippCpuEM64T = 0x20, /* Intel(R) 64 Instruction Set Architecture (ISA) */ |
X |
ippCpuC2D = 0x21, /* Intel(R) Core(TM) 2 Duo processor */ |
O |
ippCpuC2Q = 0x22, /* Intel(R) Core(TM) 2 Quad processor */ |
O |
ippCpuPenryn = 0x23, /* Intel(R) Core(TM) 2 processor with Intel(R) SSE4.1 */ |
O |
ippCpuBonnell = 0x24, /* Intel(R) Atom(TM) processor */ |
O |
ippCpuNehalem = 0x25, /* Intel(R) Core(TM) i7 processor */ |
O |
ippCpuNext = 0x26, |
O |
ippCpuSSE = 0x40, /* Processor supports Streaming SIMD Extensions instruction set */ |
X |
ippCpuSSE2 = 0x41, /* Processor supports Streaming SIMD Extensions 2 instruction set */ |
X |
ippCpuSSE3 = 0x42, /* Processor supports Streaming SIMD Extensions 3 instruction set */ |
Static |
ippCpuSSSE3 = 0x43, /* Processor supports Supplemental Streaming SIMD Extension 3 instruction set */ |
O |
ippCpuSSE41 = 0x44, /* Processor supports Streaming SIMD Extensions 4.1 instruction set */ |
O |
ippCpuSSE42 = 0x45, /* Processor supports Streaming SIMD Extensions 4.2 instruction set */ |
O |
ippCpuAVX = 0x46, /* Processor supports Advanced Vector Extensions instruction set */ |
Dynamic |
ippCpuAES = 0x47, /* Processor supports AES New Instructions */ |
O |
ippCpuX8664 = 0x60 /* Processor supports 64 bit extension */ |
X |
X : mean my program can't execute completely. (ignore this please).
O : mean my program run completely, but the output not same as (1) or (2).
Static : mean the output same as (1).
Dynamic: Mean the output same as (2).
As you see, only CPU type ippCPUSSE3 give the same output as static library.
I wonder if the static is always use ippCPUSSE3 as default in IPP PXE 2011?
Let us know how do you think about these.
Best regards,
Tam.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tam,
Thanks for sharing the test result.
I check the version information, https://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-libraries-are-included-in-the-intel. It seems PXE 2011 are included IPP 7.0 x version.
No, I don't recalled that the static is always use ippCPUSSE3 as default in IPP PXE 2011?
anyway, could you please add the below printf information when 1) static and 2) dynamic + call ippInitCPU(). in the test case?
ippCpuSSE3 = 0x42, /* Processor supports Streaming SIMD Extensions 3 instruction set */
Static
lib = ippsGetLibVersion();
printf(“%s %s %d.%d.%d.%d\n”,
lib->Name, lib->Version,
lib->major,
lib->minor, lib->majorBuild, lib->build);
}
Output:
Thanks
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
Here are printed information for (1) and (2):
(1) : ippsm7_l.lib 7.0 build 205.23 7.0.205.1024
(2) : ippsm7-7.0.dll 7.0 build 205.23 7.0.205.1024
Regards,
Tam.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tam,
Thank you much for the test. Then it is clear now.
IPP dispatched the optimized code according to the CPU type.
For example , the table in https://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp
m7: means optimized for SSE3 CPU 64bit
As the static link was using m7 code, so when dynamic + init CPU (ippCPU SSE3) use m7 , then they run same cpu-optimized code. So keep same result.
IPP 9.0 have update in the week. You may try it at https://software.intel.com/en-us/intel-ipp/ => try .
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
Thank you so much. It's clear for me, too.
I'm going to migration my work from PXE 2011 to PXE 2016. I may have more questions in future.
Best Regards,
Tam.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page