ippsThreshold with NaN works differently for various CPU types

steven_s_3 · ‎04-14-2016

Using the following example code:

Ipp32f a[] = {-1,-2,-3,-4};
Ipp32f b[4];
ippsLog10_32f_A24(a, b, 4); // Generate some NaNs (actually -NaN)
ippsThreshold_32f_I(b, 4, 0, ippCmpLess);
ippsThreshold_32f_I(b, 4, 255, ippCmpGreater);

I would expect the results to be somewhere between 0 and 255. And in fact, it does work that way (all results are 0) when using IPP 7.0 64-bit, compiled and run on a system with an Intel Xeon X5670 Westmere CPU. But when I take the exact compiled object code and all the necessary shared object libraries, move them to a machine with an Intel Xeon E5-2690 v3 Haswell CPU, the results are different. The -NaNs get propagated through the threshold functions and the result is all -NaNs.

I tried the same thing using IPP 9.2 64-bit (currently latest version) compiled and run on the Haswell system and the result is -NaNs. But when I take that exact same object code and shared object libraries, move them to the Westmere machine, the results are all zeros.

This behavior has caused a big problem where I would develop and test on the Westmere and everything works great. But when deployed to the Haswell, the application crashes because the result of the operations above is cast to int used to index into an array. I can fix it by using unsigned int, in which case the -NaN becomes 0 or better, use isnan() and replace with 0 if true.

I can understand Intel claiming that IPP doesn't check for NaN because it would hurt performance, which case I would have found and corrected the problem in my testing on the Westmere. Or, alternatively, I can understand doing the check for NaN and giving me what I want, a result between 0 and 255, which case it would not crash on the Haswell. But what I don't like is not doing the same thing using the same source code on Westmere vs. Haswell CPU types.

Where can I find information on other "gotchas", specifically a list of functions that give different results on different CPU architectures? And I'm not talking about things like just changes in the low order precision bits but more about drastic functional differences such as above, propagating NaNs vs. doing the thresholding.

Thanks.