I wanted to fill parts of a matrix with quiet NaNs because:
i) If some result is a NaN, I know those parts are being used.
ii) It simplifies the code since I can do vector/elemental operations on the whole matrix, and I assumed that arithmetic ops with NaNs would take very less CPU time, and so the cost of simplifying the code would be less. But it would seem that I assumed wrong.
This is a sample code I wrote to test:
program check9 use, intrinsic :: ieee_arithmetic implicit none integer(kind=8) :: i real :: NaN, a, start, finish, time_cpu call cpu_time(start) a = 2.0 NaN = ieee_value(0.0, ieee_quiet_nan) do i = 1, 1000000000 a = a / NaN a = a * NaN end do print*, a, NaN call cpu_time(finish) time_cpu = finish - start print*, "CPU time =", time_cpu end program check9
This takes me about 3 seconds with ifort. The same ops with a real number is so fast it doesn't even register any significant digits.
NaN operations on Intel CPUs are likely to generate exceptions which invoke microcode, so the relative slowdown probably varies greatly with CPU model. It seems you have also the possibility that the compiler might shortcut your test loop.
>>Do you think my overhead will be the lowest if I use REAL(0)s instead?
I suggest inserting an unusual number.
You could insert a negative 0.0 or a sub-normal number. You can precondition the SSE to set FTZ or DAZ to assure that if written, it will be overwritten with +0.0.
The only downside of this is, this will not indicate if the contained value were used for computation. In this case you would want to insert an SNaN