Solved: detailed explanation about DPD200259470

teng_w_ · ‎11-29-2014

Hi,

I'm using IPP 7.0 on win7 x64 bit. I found a problem about IppiConvValid_32f_C1R.

That's I get different results on two computers by this function.

So I want to see the detailed explanation about DPD200259470 on https://software.intel.com/en-us/articles/intel-ipp-70-library-bug-fixes/

to check whether the bug is the same as me.

Thank u.

Igor_A_Intel · ‎12-01-2014

Ok, my mistake, both support AVX - therefore the same code version works for both. IPP code has different optimized code for different CPUs. The optimal one is dispatched automatically for dynamic libraries, for static libraries you should call ippInit() function before any other call to IPP functions. It is not clear from your code example which kind of linking you use - but I don't see ippInit() call. Almost all IPP functions have internally different code branches for different combinations of source and destination buffers alignment. Therefore my next supposition is that as you have not made any special efforts on the source/destination buffers alignment - on different machines you face with differently aligned buffers, and therefore - with different code branches in IPP functions. Use ippMalloc() to have the same and the best alignment (32-byte in your particular case) for memory buffers. Also there are single and multi-threaded versions of IPP libraries - for larger sizes you can face with the same output difference because of the different number of threads (and therefore different internal data decomposition) - one of your CPUs supports 4 hw threads while another one - 8.

regards, Igor

View solution in original post

Igor_A_Intel · ‎12-01-2014

Hi Teng,

Here it is:

"The code ( attached main.c, wrapper.asm ) shows the problem with ippiConvValid_32f_C1R routines with 64 bit code linked.

accordingly specification,

all registeres XMM6:XMM15 Nonvolatile Must be preserved as needed by callee.

but they don't preserved and are mofigying during the ippiConvValid call.

the expected and actuals outputs were provided into main.c file

=== The register value is broken ==="

regards, Igor

teng_w_ · ‎12-01-2014

Hi,

thanks for your quickly reply.

I'm not sure the relationship between my problem and DPD200259470.

The attched file is my code and testdata.

The ConvValid_0.raw is the input file and the two ConvValid_1_*.raw are the result of my computer and another server computer.

There are differences between two result files.

The test environment as follows:

my Computer:

Intel(R) Core(TM) i5-3470 CPU @3.2GHz

memory:8G

win7 x64 bit professional

Server Computer:

Intel(R) Core(TM) i7-3820 CPU @3.6GHz

memory:16G

win7 x64 bit professional

please check it , thank u.

Igor_A_Intel · ‎12-01-2014

Hi Teng,

why do you think that the difference in 1-2 least significant bits is a bug? You have different CPUs - one supports SSE4 only while the other one supports AVX - therefore the different versions of optimized code are working. 32f code can't be compared bit-to-bit - it should be compared with some reasonable epsilon that depends on the number of arithmetic operation per one output point (that is huge for cross-corr). The different order of calculations will always lead to the different results for floating point functions/calculations.

regards, Igor

teng_w_ · ‎12-01-2014

Hi Astakhov,

I understand your point of view. Thank you.

The attached file is the screenshot of two cpus.

You can see both cpu support AVX and SSE4.

I was confused about the different verison of code in the fucntion.

Igor_A_Intel · ‎12-01-2014

Ok, my mistake, both support AVX - therefore the same code version works for both. IPP code has different optimized code for different CPUs. The optimal one is dispatched automatically for dynamic libraries, for static libraries you should call ippInit() function before any other call to IPP functions. It is not clear from your code example which kind of linking you use - but I don't see ippInit() call. Almost all IPP functions have internally different code branches for different combinations of source and destination buffers alignment. Therefore my next supposition is that as you have not made any special efforts on the source/destination buffers alignment - on different machines you face with differently aligned buffers, and therefore - with different code branches in IPP functions. Use ippMalloc() to have the same and the best alignment (32-byte in your particular case) for memory buffers. Also there are single and multi-threaded versions of IPP libraries - for larger sizes you can face with the same output difference because of the different number of threads (and therefore different internal data decomposition) - one of your CPUs supports 4 hw threads while another one - 8.

regards, Igor

teng_w_ · ‎12-02-2014

I have got it, thank u.

teng_w_ · ‎12-07-2014

Hi Astakhov,

I have another issue to consult with u.

Does the ipp functions such as ippiConvValid_32f_C1R or ippiFilter_ optimized by multithread?

Or the function only optimized by C code and Instruction Set for exmaple SSE AVX?

Thank u.