>>Do you think my overhead

Vishnu · ‎08-30-2016

I wanted to fill parts of a matrix with quiet NaNs because:

i) If some result is a NaN, I know those parts are being used.

ii) It simplifies the code since I can do vector/elemental operations on the whole matrix, and I assumed that arithmetic ops with NaNs would take very less CPU time, and so the cost of simplifying the code would be less. But it would seem that I assumed wrong.

This is a sample code I wrote to test:

program check9

    use, intrinsic :: ieee_arithmetic

    implicit none

    integer(kind=8) :: i
    real :: NaN, a, start, finish, time_cpu

    call cpu_time(start)

    a = 2.0

    NaN = ieee_value(0.0, ieee_quiet_nan)

    do i = 1, 1000000000
        a = a / NaN
        a = a * NaN
    end do


    print*, a, NaN

    call cpu_time(finish)
    time_cpu = finish - start
    print*, "CPU time =", time_cpu

end program check9

This takes me about 3 seconds with ifort. The same ops with a real number is so fast it doesn't even register any significant digits.

TimP · ‎08-30-2016

NaN operations on Intel CPUs are likely to generate exceptions which invoke microcode, so the relative slowdown probably varies greatly with CPU model. It seems you have also the possibility that the compiler might shortcut your test loop.

Vishnu · ‎08-30-2016

Do you think my overhead will be the lowest if I use REAL(0)s instead? The operations involved will be additions, subtractions and multiplications.

jimdempseyatthecove · ‎08-30-2016

>>Do you think my overhead will be the lowest if I use REAL(0)s instead?

I suggest inserting an unusual number.

You could insert a negative 0.0 or a sub-normal number. You can precondition the SSE to set FTZ or DAZ to assure that if written, it will be overwritten with +0.0.

The only downside of this is, this will not indicate if the contained value were used for computation. In this case you would want to insert an SNaN

https://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz

Jim Dempsey

Why are arithmetic operations with NaN slow ?