ipps/ippi mulpack

b_k_ · ‎03-15-2016

Hello,

I was checking some code using FFT and mulpack, and noticed a case where results are a bit off.

both ippi and ipps mulpack versions gave the same results.

Spec: i7-4790, Ipp 8.1 32bit. win 7 pro SP1 64bit.

Using VS2013 and stepping into the disassembly, the call stack shows that ippsh99-8.1dll is loaded.

Here is an example with ipps, based on mulpack.c from the IPP 9.0 update 2 documentation.

The example transforms the vector [0,8,0,0,0,0,0,0], and multiplies the results by itself.

        IppStatus status = ippStsNoErr;
        IppsFFTSpec_R_32f* pSpec = NULL;  
        Ipp8u *pMemInit = NULL, *pBuffer = NULL, *pSpecMem = NULL; /* Pointer to the work buffers */
        int sizeSpec = 0, sizeInit = 0, sizeBuf = 0;               /* size of FFT pSpec structure, Init and work buffers */
        status = ippsFFTGetSize_R_32f(3, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, &sizeSpec, &sizeInit, &sizeBuf);
        /* memory allocation */
        pSpecMem = (Ipp8u*)ippMalloc(sizeSpec);
        pBuffer = (Ipp8u*)ippMalloc(sizeBuf);
        pMemInit = (Ipp8u*)ippMalloc(sizeInit);
        status = ippsFFTInit_R_32f(&pSpec, 3, IPP_FFT_DIV_INV_BY_N, ippAlgHintAccurate, pSpecMem, pMemInit);
        
        Ipp32f src[8] = { 0.0f, 8.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
        Ipp32f fft_dst[8] = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };        

        status = ippsFFTFwd_RToPack_32f(src, fft_dst, pSpec, pBuffer);
        ippFree(pMemInit);
        ippFree(pSpec);
        ippFree(pBuffer);
        
        Ipp32f dst[8] = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
        
        status = ippsMulPack_32f(fft_dst, fft_dst, dst, 8);

The results in the fft_dst and dst buffers are in packed format. the first two indices are R0 and R1, the real components of the first two elements.

fft_dst = [8.00000000 5.65685415 -5.65685415 0.000000000 -8.00000000 -5.65685415 -5.65685415]

dst = [64.0000000 8.12035296e-007 -63.9999962 -64.0000000 -0.000000000 8.12035296e-007 63.9999962 64.0000000]

Here R1 (the second index) should be zero, but instead it is a very small number.

So fft_dst[1] = 5.65685415 + j ( -5.65685415 )

This gives fft_dst[1]*fft_dst[1] = 5.65685415 * 5.65685415 + 2 j* ( -5.65685415) + j*j*( -5.65685415 )*( -5.65685415 )

Rearranging real and imaginary parts:

fft_dst[1]*fft_dst[1] = {5.65685415 * 5.65685415 - ( -5.65685415 )*( -5.65685415 ) } + { 2 j* ( -5.65685415)}

The real part should be zero, but it is R1 = 8.12035296e-007

Old IPL (Intels image processibng library) gave the correct result.

Ying_H_Intel · ‎03-17-2016

Hi B.k,

Thanks for the report. We will investigate it and get back to you if any news.

Best Regards,

Ying

Ying_H_Intel · ‎03-20-2016

Hi B K.

We investigated the results and thought, it should be normal situation for floating point – it’s just rounding error = 8.12e-7/8.0 = ~0.1e-7 – less than the weight of the least significant bit in Ipp32f mantissa related to input signal amplitude.

I guess that IPL doesn’t provide such rounding error because uses internally FPU that performs all intermediate calculations in higher precision (53 or 64 bit mantissa).

Best Regards,

Ying

b_k_ · ‎03-21-2016

Hi Ying, Thank you fo the answer. I came up with the question since we were moving from Ipp 7 to 8.1. This move included tests to the functions we use, and the one for mulpack failed. We will implement a tolerance for the error. the results for the code I posted above with Ipp7 were: dst = [64.0000000 0.000000000 -63.9999962 -64.0000000 -0.000000000 0.000000000 63.9999962 64.0000000] That is better. I guess the 63.999 results are also because of roundoff errors. When testing with Ipp 7 the program loaded ippsg9-7.0.dll. With Ipp 8.1 it was ippsh9-8.1 ( I had a typo above). The different dlls use different instructions for the calculations, so differing results are to be expected. I know floating point erorors are a known and old problem ("What Every Computer Scientist Should Know About Floating-Point Arithmetic"). Is there any article or discussion on intels website about what kind of roundoff errors one can expect with IPP? Maybe algorithms to minimize the errors?