Re: unexpected results with intel primitive (when dividing by 1)

Josephnyu · ‎01-29-2024

I'm using Intel Integrated Performance Primitives (windows, v2019 (2019.0.5.1052), 64bit) and I came across unexpected results.

When I run the following code

    std::cout << std::setprecision(20);
    const int N = 8;
    const std::vector<double> one(N, 1.0);
    std::vector<double> d(N,0.014058305571221497);
    
    std::cout << "before division by 1 : " << d[0] << std::endl;
    ippsDiv_64f_I(one.data(), d.data(), N);
    std::cout << "after division by 1 : " << d[0] << std::endl;

I get the following output :

before division by 1 : 0.014058305571221497293 
after division by 1 : 0.014058305571221500763

1/ How dividing by 1 can change the value ?

2/ For any value of N between 1 and 7, dividing by 1 has no effect. Only for N>7 I get this. How can it be that the result of a division of doubles depend on the size of my array ?

Could it be a bug ?

Note : I tested with v2022 and I got the same results

Ruqiu_C_Intel · ‎01-30-2024

Hello Josephnyu,

Thank you for posting your concern here.

We have reproduced it, and will investigate further internally for the accuracy.

Regards,

Ruqiu

Chao_Y_Intel · ‎02-07-2024

Hi Josephnyu,

In the double-precision floating-point format, the fractional part is 52 bits: https://en.wikipedia.org/wiki/Double-precision_floating-point_format

The precision of this format is around 15 decimal digits: 2^52 ~ 10^15.

IPP functions are optimized using various instructions, which may result in minor rounding differences at the least significant bits. In the test code provided, the difference is around the 10^-15 level. This looks an acceptable numerical discrepancy.

For larger values of N, such as N >= 8, vectorized instructions (AVX512/AVX2) can be utilized. For smaller values of N, scalar instructions are used. so the final result may have differences within its precision range.

Thanks,

Chao

Josephnyu · ‎02-07-2024

Thanks @Chao_Y_Intel

The difference between the result that I get and the expected results is 2ULP. So the "rounding" impacts more than 1 bit.
Deciding if the discrepancy is acceptable or not depends highly on the usage. If I divide 1milion times by 1 can I get very far from my starting point ?

Does intel provide any guarantee about the error of the results ? If yes can you please share tem.
I'd like also confirmation that there is no bug and that's those results are compliant with the specs of IPP.

Gennady_F_Intel · ‎02-08-2024

you could try to check the behavior with ippsDiv_64f_A53 function which guarantees 53 correctly rounded bits of significand, including the implied bit, with the maximum guaranteed error within 1 ulp.

The API of this function is slightly different compare with ippsDiv_64f_I.

Josephnyu · ‎02-08-2024

Thanks for the suggestion @Gennady_F_Intel.

I tested ippsDiv_64f_A50 and ippsDiv_64f_A53 on this test case and they returned the expected results.
Perf of ippsDiv_64f_A50 is similar ippsDiv_64f_I.

However perf of ippsDiv_64f_A53 is worst by ~8%.

Can you tell how many ulps are gauranteed by ippsDiv_64f_I ?

Chao_Y_Intel · ‎02-07-2024

thanks, Josephyu. We are checking it with the function owner, to make sure the result is expected.

Ruqiu_C_Intel · ‎05-28-2024

Hi Josephnyu,

Thank you for your patience.

As IPP uses the following 'optimized' algorithm to calculate (~in parallel) a0/b0, a1/b1, …, a7/b7. Looking only at a0/b0, the computation is:

1

a0/b0 ~= ((( ------------------------ * (b2*b3))*b1)*a0)

((b0*b1)*(b2*b3))

Each pair of parentheses indicates a rounding-to-nearest operation; we have 7 floating-point operations - 6 multiplications and one division; but the error in (b2*b3) is the same at the numerator and denominator, so it cancels out. The maximum ulp error is 6 ulps.

Regards,

Ruqiu