I'm using Intel Integrated Performance Primitives (windows, v2019 (2019.0.5.1052), 64bit) and I came across unexpected results.
When I run the following code
std::cout << std::setprecision(20); const int N = 8; const std::vector<double> one(N, 1.0); std::vector<double> d(N,0.014058305571221497); std::cout << "before division by 1 : " << d[0] << std::endl; ippsDiv_64f_I(one.data(), d.data(), N); std::cout << "after division by 1 : " << d[0] << std::endl;
I get the following output :
before division by 1 : 0.014058305571221497293 after division by 1 : 0.014058305571221500763
1/ How dividing by 1 can change the value ?
2/ For any value of N between 1 and 7, dividing by 1 has no effect. Only for N>7 I get this. How can it be that the result of a division of doubles depend on the size of my array ?
Could it be a bug ?
Note : I tested with v2022 and I got the same results
链接已复制
Hi Josephnyu,
In the double-precision floating-point format, the fractional part is 52 bits: https://en.wikipedia.org/wiki/Double-precision_floating-point_format
The precision of this format is around 15 decimal digits: 2^52 ~ 10^15.
IPP functions are optimized using various instructions, which may result in minor rounding differences at the least significant bits. In the test code provided, the difference is around the 10^-15 level. This looks an acceptable numerical discrepancy.
For larger values of N, such as N >= 8, vectorized instructions (AVX512/AVX2) can be utilized. For smaller values of N, scalar instructions are used. so the final result may have differences within its precision range.
Thanks,
Chao
Thanks @Chao_Y_Intel
The difference between the result that I get and the expected results is 2ULP. So the "rounding" impacts more than 1 bit.
Deciding if the discrepancy is acceptable or not depends highly on the usage. If I divide 1milion times by 1 can I get very far from my starting point ?
Does intel provide any guarantee about the error of the results ? If yes can you please share tem.
I'd like also confirmation that there is no bug and that's those results are compliant with the specs of IPP.
you could try to check the behavior with ippsDiv_64f_A53 function which guarantees 53 correctly rounded bits of significand, including the implied bit, with the maximum guaranteed error within 1 ulp.
The API of this function is slightly different compare with ippsDiv_64f_I.
Thanks for the suggestion @Gennady_F_Intel.
I tested ippsDiv_64f_A50 and ippsDiv_64f_A53 on this test case and they returned the expected results.
Perf of ippsDiv_64f_A50 is similar ippsDiv_64f_I.
However perf of ippsDiv_64f_A53 is worst by ~8%.
Can you tell how many ulps are gauranteed by ippsDiv_64f_I ?
Hi Josephnyu,
Thank you for your patience.
As IPP uses the following 'optimized' algorithm to calculate (~in parallel) a0/b0, a1/b1, …, a7/b7. Looking only at a0/b0, the computation is:
1
a0/b0 ~= ((( ------------------------ * (b2*b3))*b1)*a0)
((b0*b1)*(b2*b3))
Each pair of parentheses indicates a rounding-to-nearest operation; we have 7 floating-point operations - 6 multiplications and one division; but the error in (b2*b3) is the same at the numerator and denominator, so it cancels out. The maximum ulp error is 6 ulps.
Regards,
Ruqiu
