Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*
412 Discussions

ICX 2022.2.0 and 2022.2.1 double precision bug

vineetsoni
Beginner
401 Views

Hello,

We have observed a slight change in the double precision results using ICX 2022.2.0 and 2022.2.1. However, this does not exist in the prior releases.

 

Here's the reproducible example:

#include <iostream>
#include <iomanip>
#include <vector>
#include <cmath>

bool alleq(int size, double ref, double* __restrict ret)
{
    bool same = true;
    for (int i = 0; i < size; ++i)
    {
        same &= ref == ret[i];
    }
    std::cout << std::setprecision(17) << ref << " " << ret[0] << std::endl;
    return same;
}

inline double m(double a, double b, double c)
{
    return std::exp(a * (1. / b - 1. / c));
}

void mv(int size, double a, double* __restrict b, double c, double* __restrict ret)
{
    for (int i = 0; i < size; ++i)
    {
        ret[i] = m(a, b[i], c);
    }
}

int main(int num, char** argv)
{
    double one(num);
    int size = num * 8;
    std::vector<double> temp(size, 2.);
    std::vector<double> ret(size);
    mv(size, one, temp.data(), one, ret.data());
    return alleq(size, std::exp(-1 / 2.), ret.data()) ? 1 : 0;
}

It is compiled with these flags:

-O3 -fp-model=fast -fhonor-nans -fhonor-infinities -fsigned-zeros -fno-math-errno -ffp-contract=off -march=core-avx2 -mtune=core-avx2


When 2022.2.0 or 2022.2.1 is used, the `ref` and `ret` are not the same and returns 0. But, when compiled with any previous ICX version, both are the same and returns 1 (which is also the case with GCC).

You can also quickly check this on Godbolt here: https://godbolt.org/z/57YEcjscK 

Could you please confirm if this is indeed a bug? And if it is, is there a fix (without losing performance)?

PS: It gives correct results when `-fp-model=precise` or `strict` is used, which implies even more aggressive fp optimizations have made their way in these two releases with `fast`.

Best regards,

Vineet

0 Kudos
6 Replies
vineetsoni
Beginner
390 Views

So, the way to have the same behavior of ICX 2022.1.0 and older from 2022.2.0 and 2022.2.1 is to compile with an additional flag of `-fno-approx-func`. However, this may have an impact on the performance.

 

Is there any better suggestion?

SantoshY_Intel
Moderator
348 Views

Hi,


Thanks for posting in the Intel forums.


We were able to reproduce your issue from our end. We are working on your issue and will get back to you soon.


Thanks & Regards,

Santosh


vineetsoni
Beginner
306 Views

Thanks, Santosh!


We also observe another issue with ICX 2022.2.X. One of our simulations gives completely wrong results. However, this isn't the case in previous releases.


With ICX 2022.1.0, it gives good results with: `-O3 -fp-model=fast -fhonor-nans -fhonor-infinities -fsigned-zeros -fno-math-errno -ffp-contract=off -march=core-avx2 -mtune=core-avx2`


While with ICX 2022.2.X gives wrong results even with `-O3 -fp-model=strict -fno-math-errno -ffp-contract=off -march=core-avx2 -mtune=core-avx2`. So, it does not look like a numerical precision/accuracy issue. It fails even with `-O2 -fp-model=strict` and works only with `-O1 -fp-model=strict`.


Unfortunately, I cannot give a reproducible for this case. But, maybe the above combination of compile options might give you some hints as to what could potentially have changed the default behavior in ICX 2022.2.X.

vineetsoni
Beginner
237 Views

To give you some more updates, we tried the latest ICX 2023.0.0, and we still have the same problems, i.e. the above reproducible fails, and one of the simulations gives completely wrong results compared to 2022.1.0.

 

PS: no issue when using GCC8, 10 & 11.

SantoshY_Intel
Moderator
223 Views

Hi,

 

  1. The intent of -fp-model=fast is specifically to allow aggressive optimizations that may trade off performance for less precise results. If guaranteed build-to-build numerical consistency is required, then -fp-model=precise must be used.
  2. If you need precise results but want to keep the optimizations enabled then using the -fno-approx-func is the best choice.
  3. Using "-fp-model=fast -fno-approx-func" will affect only loops with calls to math functions. Other optimizations allowed by -fpmodel=fast will work without restrictions. Of course, that may cause a loss of calculations' precision in other places but that is a usual tradeoff between performance/precision.
  4. With "-fp-model=fast -fno-approx-func", the loop is vectorized with vector length 4, scalar remainder, and calling a most precise svml function.
  5. But, with just -fp-model=fast the loop is vectorized with vector length 8, vectorized remainder, and calling less precise svml function.
  6. Looking at the test and how it's run, the remainder vectorization does not matter. The whole difference is in svml function.
  7. The Vectorizer is not a fault here, the variation in this example is due to changes in the svml function. This does not violate the intent of -fp-model=fast.

 

This particular reproducer is not in fact evidence of a bug. Hence we are closing this issue. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

 

On the other hand, we are very interested in your observation of wrong results with -fp-model=strict, but without a reproducer, we are unable to assist.

If the reproducer is found, open a new issue and we will investigate.

 

Thanks & Regards,

Santosh

 

 

vineetsoni
Beginner
146 Views

Hi Santosh,

 

Thanks for the explanation. I see the point. In fact, I can also reproduce the same behavior with `libmvec` from (newer) `glibc` with GCC compilers.

 

Also, since `-fno-approx-func` is not in ICX's list of compiler options (https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-an...), could you confirm if using it is equivalent to `-fimf-max-error=1.0`?

 

Thanks,

Vineet

Reply