More trouble with NANs

janez-makovsek · ‎05-11-2023

Dear All,

It came to my attention that:

Comparisons involving a NaN will return False.

According to:
https://datatest.readthedocs.io/en/latest/how-to/nan-values.html

And that

The behavior was standardized in IEEE 754

In this sense, it should never happen, that :

if (a[i] == 0.0)

should be returning true, as is the case with the current DPC++ compiler.

I saw that it was recommended to use

/Qhonor-nan-compares

but this should be not be a requirement either. When comparing two 32bit registers or two 64bit registers, it should never happen, that the result is true, if one side contains NAN and the other one a valid number.

The incentive behind all this is "performance". How to get back that what was lost with migration from icc to icx. This topic continues from discussion here:

https://community.intel.com/t5/Intel-oneAPI-Data-Parallel-C/NaN-problems-continue-to-persist/m-p/1476706/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufExHR0ZLSjlQRFdTOEUzfDE0NzY3MDZ8U1VCU0NSSVBUSU9OU3xoSw#M2971

Thanks!
Atmapuri

SeshaP_Intel · ‎05-12-2023

Hi,

Thank you for posting in Intel Communities.

We are working on this internally. We will get back to you soon.

Thanks and Regards,

Pendyala Sesha Srinivas

SeshaP_Intel · ‎05-25-2023

Hi,

Thanks for your patience.

"According to IEEE 754, the comparison between NaN and any floating-point value x should return a False."

This is true, but we are using the fp-model fast by default, and we make no guarantees that this model is IEEE 754 compliant. This model is also not IEEE-754-compliant in icc either, despite the fact that icc does respect NaN values. As mentioned before, gcc behaves this same way if -ffast-math is used.

For example https://godbolt.org/z/6MT3s48hd

We understand your view here, but this is intentional behavior.

Thanks and Regards,

Pendyala Sesha Srinivas

janez-makovsek · ‎05-25-2023

Dear Pendyala,

>We understand your view here, but this is intentional behavior.

1.) It does not really matter how you want to present this, but this comparison:

if (a[i] == 0)

when a[i] contains NAN, may never be TRUE, regardless of compiler switch or optimization logic.

2.) I dont think that icx is intentionally slower than icc.

What will otherwise be next?

if (1 == 0)

returning TRUE?

Kind Regards!

Atmapuri

SeshaP_Intel · ‎06-05-2023

Hi,

We are considering the following reproducer.

#include <cmath>
#include <iostream>
#include <limits>

void f(double *d) 
{ 
*d = std::numeric_limits<double>::quiet_NaN(); 
}

int main() {
  int i;
  double buf[4];
  for (i = 0; i < 4; ++i)
    f(&buf[i]);
  for (i=0; i<4; i++) {
    if (buf[i] == 0.0)
      std::cout << "buf[i] is 0.0\n";
    else if (std::isnan(buf[i]))
      std::cout << "buf[i] is NaN\n";
    else
      std::cout << "buf[i] is not NaN or 0.0\n";
    std::cout << "buf[" << i << "] = " << buf[i] << "\n";
  }
  return 0;
}

The same behavior occurs with gcc or open-source Clang if I use the -ffast-math option. In fact, with GCC, I can also replace the call to f() with local assignment and remove the else clauses, but icx and clang optimize differently with those changes and don't report "buf[i] == 0"

This is a fascinating case, and an unfortunate quirk of the X86-instruction set. What's happening is that without fast-math enabled, clang, gcc, icx, and icc all represent "if (buf[i] == 0" with a sequence of instructions like this:

 ucomisd xmm1, xmm2  ; Where xmm1 is buf[i] and xmm2 is 0.0
  jp      .LABEL_ELSE
  je      .LABEL_THEN

The ucomisd instruction sets the ZF, PF, and CF flags, depending on the result of the comparison. If the comparison is unordered (either operand is NaN), the instruction sets all three flags. If the comparison is greater than, the instruction leaves all three flags clear. If the comparison is less than, the instruction sets only CF. If the comparison equal, the instruction sets only ZF. An unfortunate implication of this is that equal and unordered look the same if you only test ZF.

If buf[i] is NaN, the ucomisd instruction sets the ZF, PF, and CF flags. The jp instruction is checking for the unordered condition. If the comparison wasn't unordered, then the je instruction tests for equality (ucomisd sets the zero flag for equality). So far, so good. However, when fast-math is enabled, the compiler still generates the ucomisd (because it has better latency and throughput than cmpsd), but because we have assumed that there are no NaNs in the program, it omits the 'jp' test. The ucomisd instruction sets ZF, PF, and CF, but since we're only testing ZF, the generated machine code behaves as if it were equal. This is unfortunate, but it is necessary to get the best performance.

>>> 2.) I dont think that icx is intentionally slower than icc.

That is true. The fact that icx would require the -fhonor-nan-compares flag to get the same behavior as icc is because icc always honors NaN compares. Not honoring NaN compares by default makes icx marginally faster than icc (as is seen in the example above), and using the -fhonor-nan-compares option just takes that advantage away from icx. It doesn't make icx slower.

Basically, what I'd say is that if you use options which tell the compiler that your program has no NaN inputs or outputs (and this is the default for icx), and your program does have NaN inputs or outputs, the behavior is undefined.

Thanks and Regards,

Pendyala Sesha Srinivas

janez-makovsek · ‎06-05-2023

Dear Pendyala,

The desired behaviour here is that the both the compiler and the CPU are "completely" oblivious to the existence of NAN or INF values.

Only when the programmer explicitly calls IsNan or IsInf functions, then the extra logic would be added. The compiler and CPU should simply "back off", from trying to catch or treat NANs.

Both NAN and INF are special numbers, which can never be equal to any other floating point number. It is true, that they are not a single number (a constant), but that does not matter in this case. The fundamental and primary assumption when writing code is that they are NOT equal to any other number and they are not, if you compare the two bit by bit.

Therefore:

1.) If this is a hardware bug, can you submit a ticket?

2.) Why not use the icc behaviour, which is correct, but insist on the GCC behaviour? You already have codegen that works. Why change it and why not reuse existing functional pattern known from icc?

>icc always honors NaN compares

Humm... I would have to check the icc codegen, but handling (NAN == NAN) and handling (NAN == 0) requires two different code paths. When you say "honors NaN compares", this implies handling of (NAN == NAN) and should have no influence on (NAN == 0), which needs to work always in any case.

Honoring NAN compares means, that NAN detection logic is added before each compare and that is not what we want, because it would only slow down all compares in general.

Thanks!
Atmapuri

SeshaP_Intel · ‎06-08-2023

Hi,

We have provided your feedback to the development team.

They are looking into the issue.

If there are any fixes or improvements, they would be part of the release notes. Please find the link below for release Notes.

https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-dpc-c-compiler-release-notes.html

Please let me know if we can go ahead and close this case.

Thanks and Regards,

Pendyala Sesha Srinivas

janez-makovsek · ‎06-08-2023

Yes, You can close this issue.

SeshaP_Intel · ‎06-09-2023

Hi,

Thanks for the confirmation. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.

Thanks and Regards,

Pendyala Sesha Srinivas