ippsAddProduct low precision

janw80 · ‎10-16-2011

Hi! On Intel IPP 6.1.2.051 on Xeon E5430 (EM64T, SSE4.1) the result of ippsAddProduct_32fc has for some reason an error of the order of +-1e-7 when compared to Matlab, ippsMul_32fc with ippsAdd_32fc_I, and direct C 32bit arithmetic as well as 64bit arithmetic that is cast to 32bit at the end. All the comparison answers are identical, only the ippsAddProduct answer is different.

Is that a bug in IPP? Or is this to be expected (SSE4.1 DPPS instruction known to do something odd on Xeon?)

Reference (cast): 39359.343750000000000 + i*0.0
Reference (float): 39359.343750000000000 + i*0.0
ippsAddProduct: 39359.347656250000000 + i*0.000000000000000
ippsMul, ippsAdd_I: 39359.343750000000000 + i*0.000000000000000

The code is:

[cpp]#include 
#include 
#define ACCU_RE  3.653725000000000e+04
#define A_B_RE   3.360162734985352e+01
#define A_B_IM   4.114639663696289e+01
int main(int argc, char** argv) {
   const int N = 16;
   Ipp32fc v;

   Ipp32fc* a = ippsMalloc_32fc(N);
   Ipp32fc* b = ippsMalloc_32fc(N);
   Ipp32fc* accu_fma = ippsMalloc_32fc(N);
   Ipp32fc* accu_muladd = ippsMalloc_32fc(N);
   Ipp32fc* tmp = ippsMalloc_32fc(N);
   v.re = ACCU_RE;
   v.im = 0;
   ippsSet_32fc(v, accu_fma, N);
   ippsSet_32fc(v, accu_muladd, N);
   v.re = A_B_RE;
   v.im = A_B_IM;
   ippsSet_32fc(v, a, N);

   v.re = A_B_RE;
   v.im = -A_B_IM;
   ippsSet_32fc(v, b, N);

   double ref = ACCU_RE + (A_B_RE*A_B_RE + A_B_IM*A_B_IM);
   float  refF = float(ACCU_RE) + (float(A_B_RE)*float(A_B_RE) + float(A_B_IM)*float(A_B_IM));
   ippsAddProduct_32fc(a, b, accu_fma, N);
   ippsMul_32fc(a, b, tmp, N);
   ippsAdd_32fc_I(tmp, accu_muladd, N);

   printf("Reference (cast):   %6.15f + i*0.0\n", float(ref));
   printf("Reference (float):  %6.15f + i*0.0\n", refF);
   printf("ippsAddProduct:     %6.15f + i*%6.15f\n", accu_fma[0].re, accu_fma[0].im);
   printf("ippsMul, ippsAdd_I: %6.15f + i*%6.15f\n", accu_muladd[0].re, accu_muladd[0].im);

   return 0;
}
[/cpp]

Chao_Y_Intel · ‎10-19-2011

Hello,

For single precision float point data, it only 23 bit ( 1e-7) for the data.

http://en.wikipedia.org/wiki/IEEE_754-1985

so, it is fine if you have data precision around 1e-7. If you want to have more high precision, you can use double precision for the computation.

Thanks,
Chao

janw80 · ‎10-19-2011

Thanks for the numerical info!

Still, with identical inputs having zero imaginary parts, the full-double c=a*b+c with the result converted to single, and full-single c=a*b+c both give the same result.

So I'm not sure how full-single ippsAddProduct_32fc can give a result different from full-single c=a*b+c.

SergeyKostrov · ‎11-03-2011

That is possible when binary representations of results in IEEE 754 for single- and double-precision floatsare identical.

Here is an opposite case:

16968003(Base10) = 0x4B8174A2(Base16) => 0 10010111 00000010111010010100010(Base2\IEEE754)
16968004(Base10) = 0x4B8174A2(Base16) => 0 10010111 00000010111010010100010(Base2\IEEE754)
16968005(Base10) = 0x4B8174A2(Base16) => 0 10010111 00000010111010010100010(Base2\IEEE754)

A precision loss happened because source numbers are greater than 2^24.