- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A recent discussion in the gfortran mailing list inspried me to look into the behaviour of Intel Fortran's NORM2. (The article by Hanson and Hopkins provides a lot of information on various implementations of the algorithm - https://www.researchgate.net/publication/298896236). The problem is that naïve implementations can easily lead to overflows or underflows corrupting the outcome. I used Intel Fortran 18 for the program below (checking for overflow only):
program chk_norm2 implicit none real :: expected real, dimension(2) :: vector integer :: i !vector = [3.0/10.0 * sqrt(huge(1.0)), 4.0/10.0 * sqrt(huge(1.0))] vector = [3.0, 4.0] expected = 5.0 write(*,*) sqrt(huge(1.0)), huge(1.0) do i = 1,20 vector = vector * 10.0 expected = expected * 10.0 write(*,*) i, norm2(vector), sqrt(sum(vector**2)), expected enddo end program chk_norm2
And got the following result
1.8446743E+19 3.4028235E+38 1 50.00000 50.00000 50.00000 2 500.0000 500.0000 500.0000 3 5000.000 5000.000 5000.000 4 50000.00 50000.00 50000.00 5 500000.0 500000.0 500000.0 6 5000000. 5000000. 5000000. 7 5.0000000E+07 5.0000000E+07 5.0000000E+07 8 5.0000000E+08 5.0000000E+08 5.0000000E+08 9 5.0000000E+09 5.0000000E+09 5.0000000E+09 10 4.9999999E+10 4.9999999E+10 4.9999999E+10 11 5.0000000E+11 5.0000000E+11 5.0000000E+11 12 4.9999999E+12 4.9999999E+12 4.9999999E+12 13 5.0000000E+13 5.0000000E+13 5.0000000E+13 14 4.9999999E+14 4.9999999E+14 4.9999999E+14 15 4.9999996E+15 4.9999996E+15 5.0000001E+15 16 4.9999999E+16 4.9999999E+16 4.9999999E+16 17 4.9999999E+17 4.9999999E+17 4.9999999E+17 18 5.0000000E+18 5.0000000E+18 5.0000000E+18 19 Infinity Infinity 5.0000001E+19 20 Infinity Infinity 5.0000001E+20
As you can see, with only moderately large values of the elements in the vector you get overflow resulting in Infinity.
IIUIC, the standard only recommends but does not require, that the implementation take care of such situations where intermediate results may cause overflow/underflow. Still, one would expect a bit more care ;).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Very interesting.
ifort 19.0.2.190 delivers the same results, unfortunately. Switching to double makes no difference (slighly modified...):
program chk_norm2 use iso_fortran_env, only : rk => real64 !~ use iso_fortran_env, only : rk => real32 implicit none real(rk) :: expected, expected_ini real(rk), dimension(2) :: vector, vector_ini integer :: i !vector = [3.0/10.0 * sqrt(huge(1.0)), 4.0/10.0 * sqrt(huge(1.0))] vector_ini = [3.0_rk, 4.0_rk] expected_ini = 5.0_rk write(*,*) sqrt(huge(1.0_rk)), huge(1.0_rk) do i = 150,160 vector = vector_ini * 10.0_rk**i expected = expected_ini * 10.0_rk**i write(*,'(i3,3ES20.2)') i, norm2(vector), sqrt(sum(vector**2)), expected enddo end program chk_norm2
D:\...>ifort norm2_test_double.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.2.190 Build 20190117 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.16.27026.1 Copyright (C) Microsoft Corporation. All rights reserved. -out:norm2_test_double.exe -subsystem:console norm2_test_double.obj D:\....>norm2_test_double.exe 1.340780792994260E+154 1.797693134862316E+308 150 5.00+150 5.00+150 5.00+150 151 5.00+151 5.00+151 5.00+151 152 5.00+152 5.00+152 5.00+152 153 5.00+153 5.00+153 5.00+153 154 Infinity Infinity 5.00+154 155 Infinity Infinity 5.00+155 156 Infinity Infinity 5.00+156 157 Infinity Infinity 5.00+157 158 Infinity Infinity 5.00+158 159 Infinity Infinity 5.00+159 160 Infinity Infinity 5.00+160
Further I've seen in past situations where the use of norm2(vector) instead of sqrt(sum(vector**2)) in loops has a negative impact on runtime (-O2 and -O3). If someone of Intel looks on the implementation of norm2, maybe the performance can be anhanced along with a better overflow/underflow handling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Algorithms which avoid squaring may be slower even where vectorization isn't possible. One such would be promotion to x87 "real(10)."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The algorithms I have seen use scaling - they do not avoid squaring. And indeed vectorisation is an issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This has nothing to do with vectorization. IEEE 754 for 32-bit floating point has an exponent decimal max 38.23 (8-bit exponent range of -126 to +127).. Posting #1 "failure" occurs at step 19, where the exponent of vector(1) is 19, and the mantissa is other than 1. Same for vector(2). The product of the square of each cell will have a decimal exponent of 38 (plus some fractional decimal due to the mantissa being other than 1). IOW the sum of the squares (xE19.nn**2 + xE19.nn**2) is exceeding decimal value of 38.23 (exceeding 8-bit exponent of +127).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If this is a problem for your application, I suggest you write your own NORM2 (e.g. my_NORM2) up-sizes real to real(8), and real(8) to real(16), I will let you work out real(16) to ....
Steve, if you read this, does the standards require for NORM2 that the squares be made using an up-sized temporary?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is not a problem for my applications, but I simply came across the general problem (and the soliutions) and though I'd check the behaviour in Intel Fortran. Note that BLAS and LAPACK use more sophisticated implementations for thes reasons.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW
! test_NORM2.f90 ! ! FUNCTIONS: ! test_NORM2 - Entry point of console application. ! !**************************************************************************** ! ! PROGRAM: test_NORM2 ! ! PURPOSE: Entry point for the console application. ! !**************************************************************************** module my_module interface my_NORM2 module procedure real_my_NORM2, real8_my_NORM2 end interface my_NORM2 contains function real_my_NORM2(x) result(ret) real :: ret real :: x(:) real(8) :: temp integer :: i temp = 0.0_8 do i=1, size(x) temp = temp + real(x(i),kind(temp))**2 end do ret = real(sqrt(temp),kind(ret)) end function real_my_NORM2 function real8_my_NORM2(x) result(ret) real(8) :: ret real(8) :: x(:) real(16) :: temp integer :: i temp = 0.0_8 do i=1, size(x) temp = temp + real(x(i),kind(temp))**2 end do ret = real(sqrt(temp),kind(ret)) end function real8_my_NORM2 end module my_module program test_NORM2 use my_module implicit none real :: expected real, dimension(2) :: vector real(8) :: expected8 real(8), dimension(2) :: vector8 integer :: i !vector = [3.0/10.0 * sqrt(huge(1.0)), 4.0/10.0 * sqrt(huge(1.0))] vector = [3.0, 4.0] expected = 5.0 vector8 = [3.0_8, 4.0_8] expected8 = 5.0_8 write(*,*) sqrt(huge(1.0)), huge(1.0) write(*,*) sqrt(huge(1.0_8)), huge(1.0_8) do i = 1,40 vector = vector * 10.0 expected = expected * 10.0 write(*,'("real ",i3, X, 3E20.6)') i, my_norm2(vector), sqrt(sum(vector**2)), expected vector8 = vector8 * 10.0_8 expected8 = expected8 * 10.0_8 write(*,'("real8 ",i3, X, 3E20.6)') i, my_norm2(vector8), sqrt(sum(vector8**2)), expected8 enddo end program test_NORM2
Output:
1.8446743E+19 3.4028235E+38 1.340780792994260E+154 1.797693134862316E+308 real 1 0.500000E+02 0.500000E+02 0.500000E+02 real8 1 0.500000E+02 0.500000E+02 0.500000E+02 real 2 0.500000E+03 0.500000E+03 0.500000E+03 real8 2 0.500000E+03 0.500000E+03 0.500000E+03 real 3 0.500000E+04 0.500000E+04 0.500000E+04 real8 3 0.500000E+04 0.500000E+04 0.500000E+04 real 4 0.500000E+05 0.500000E+05 0.500000E+05 real8 4 0.500000E+05 0.500000E+05 0.500000E+05 real 5 0.500000E+06 0.500000E+06 0.500000E+06 real8 5 0.500000E+06 0.500000E+06 0.500000E+06 real 6 0.500000E+07 0.500000E+07 0.500000E+07 real8 6 0.500000E+07 0.500000E+07 0.500000E+07 real 7 0.500000E+08 0.500000E+08 0.500000E+08 real8 7 0.500000E+08 0.500000E+08 0.500000E+08 real 8 0.500000E+09 0.500000E+09 0.500000E+09 real8 8 0.500000E+09 0.500000E+09 0.500000E+09 real 9 0.500000E+10 0.500000E+10 0.500000E+10 real8 9 0.500000E+10 0.500000E+10 0.500000E+10 real 10 0.500000E+11 0.500000E+11 0.500000E+11 real8 10 0.500000E+11 0.500000E+11 0.500000E+11 real 11 0.500000E+12 0.500000E+12 0.500000E+12 real8 11 0.500000E+12 0.500000E+12 0.500000E+12 real 12 0.500000E+13 0.500000E+13 0.500000E+13 real8 12 0.500000E+13 0.500000E+13 0.500000E+13 real 13 0.500000E+14 0.500000E+14 0.500000E+14 real8 13 0.500000E+14 0.500000E+14 0.500000E+14 real 14 0.500000E+15 0.500000E+15 0.500000E+15 real8 14 0.500000E+15 0.500000E+15 0.500000E+15 real 15 0.500000E+16 0.500000E+16 0.500000E+16 real8 15 0.500000E+16 0.500000E+16 0.500000E+16 real 16 0.500000E+17 0.500000E+17 0.500000E+17 real8 16 0.500000E+17 0.500000E+17 0.500000E+17 real 17 0.500000E+18 0.500000E+18 0.500000E+18 real8 17 0.500000E+18 0.500000E+18 0.500000E+18 real 18 0.500000E+19 0.500000E+19 0.500000E+19 real8 18 0.500000E+19 0.500000E+19 0.500000E+19 real 19 0.500000E+20 Infinity 0.500000E+20 real8 19 0.500000E+20 0.500000E+20 0.500000E+20 real 20 0.500000E+21 Infinity 0.500000E+21 real8 20 0.500000E+21 0.500000E+21 0.500000E+21 real 21 0.500000E+22 Infinity 0.500000E+22 real8 21 0.500000E+22 0.500000E+22 0.500000E+22 real 22 0.500000E+23 Infinity 0.500000E+23 real8 22 0.500000E+23 0.500000E+23 0.500000E+23 real 23 0.500000E+24 Infinity 0.500000E+24 real8 23 0.500000E+24 0.500000E+24 0.500000E+24 real 24 0.500000E+25 Infinity 0.500000E+25 real8 24 0.500000E+25 0.500000E+25 0.500000E+25 real 25 0.500000E+26 Infinity 0.500000E+26 real8 25 0.500000E+26 0.500000E+26 0.500000E+26 real 26 0.500000E+27 Infinity 0.500000E+27 real8 26 0.500000E+27 0.500000E+27 0.500000E+27 real 27 0.500000E+28 Infinity 0.500000E+28 real8 27 0.500000E+28 0.500000E+28 0.500000E+28 real 28 0.500000E+29 Infinity 0.500000E+29 real8 28 0.500000E+29 0.500000E+29 0.500000E+29 real 29 0.500000E+30 Infinity 0.500000E+30 real8 29 0.500000E+30 0.500000E+30 0.500000E+30 real 30 0.500000E+31 Infinity 0.500000E+31 real8 30 0.500000E+31 0.500000E+31 0.500000E+31 real 31 0.500000E+32 Infinity 0.500000E+32 real8 31 0.500000E+32 0.500000E+32 0.500000E+32 real 32 0.500000E+33 Infinity 0.500000E+33 real8 32 0.500000E+33 0.500000E+33 0.500000E+33 real 33 0.500000E+34 Infinity 0.500000E+34 real8 33 0.500000E+34 0.500000E+34 0.500000E+34 real 34 0.500000E+35 Infinity 0.500000E+35 real8 34 0.500000E+35 0.500000E+35 0.500000E+35 real 35 0.500000E+36 Infinity 0.500000E+36 real8 35 0.500000E+36 0.500000E+36 0.500000E+36 real 36 0.500000E+37 Infinity 0.500000E+37 real8 36 0.500000E+37 0.500000E+37 0.500000E+37 real 37 0.500000E+38 Infinity 0.500000E+38 real8 37 0.500000E+38 0.500000E+38 0.500000E+38 real 38 Infinity Infinity Infinity real8 38 0.500000E+39 0.500000E+39 0.500000E+39 real 39 Infinity Infinity Infinity real8 39 0.500000E+40 0.500000E+40 0.500000E+40 real 40 Infinity Infinity Infinity real8 40 0.500000E+41 0.500000E+41 0.500000E+41 Press any key to continue . . .
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had raised the same issue regarding the HYPOT function. A very simple implementation of the Euclidian norm function (MYNORM2) as shown in the output below does not overflow until HUGE(1.0d0) = 1.797693134862316E+308 is exceeded in double precision (KIND = 8). However, the intrinsic NORM2 function overflows at approximately SQRT(HUGE(1.0d0)) as shown in the posts above.
program main implicit none integer(kind = 4) :: n, i real(kind = 8), dimension(:), allocatable :: x write(*,*) huge(1.0d0) read * n = 5 allocate(x(n)) x = 1.0d0 i = 0 do i = i + 1 if (i == 330) exit write(*,'(1x,a,i5,3(2x,a,es13.6))') 'i = ', i, 'x = ', maxval(x), 'IVF NORM2 = ', norm2(x), 'MYNORM2 = ', mynorm2(x) x = x*10.0d0 end do deallocate(x) read * contains function mynorm2(x) result(r) real(kind = 8) :: r real(kind = 8) :: x(:) real(kind = 8) :: temp temp = maxval(abs(x)) r = temp*sqrt(sum((x/temp)**2)) end function mynorm2 end program main
Selected output i = 1 x = 1.000000E+00 IVF NORM2 = 2.236068E+00 MYNORM2 = 2.236068E+00 ... i = 153 x = 1.000000+152 IVF NORM2 = 2.236068+152 MYNORM2 = 2.236068+152 i = 154 x = 1.000000+153 IVF NORM2 = 2.236068+153 MYNORM2 = 2.236068+153 i = 155 x = 1.000000+154 IVF NORM2 = Infinity MYNORM2 = 2.236068+154 i = 156 x = 1.000000+155 IVF NORM2 = Infinity MYNORM2 = 2.236068+155 ... i = 201 x = 1.000000+200 IVF NORM2 = Infinity MYNORM2 = 2.236068+200 ... i = 251 x = 1.000000+250 IVF NORM2 = Infinity MYNORM2 = 2.236068+250 ... i = 301 x = 1.000000+300 IVF NORM2 = Infinity MYNORM2 = 2.236068+300 i = 302 x = 1.000000+301 IVF NORM2 = Infinity MYNORM2 = 2.236068+301 i = 303 x = 1.000000+302 IVF NORM2 = Infinity MYNORM2 = 2.236068+302 i = 304 x = 1.000000+303 IVF NORM2 = Infinity MYNORM2 = 2.236068+303 i = 305 x = 1.000000+304 IVF NORM2 = Infinity MYNORM2 = 2.236068+304 i = 306 x = 1.000000+305 IVF NORM2 = Infinity MYNORM2 = 2.236068+305 i = 307 x = 1.000000+306 IVF NORM2 = Infinity MYNORM2 = 2.236068+306 i = 308 x = 1.000000+307 IVF NORM2 = Infinity MYNORM2 = 2.236068+307 i = 309 x = 1.000000+308 IVF NORM2 = Infinity MYNORM2 = Infinity i = 310 x = Infinity IVF NORM2 = Infinity MYNORM2 = NaN i = 311 x = Infinity IVF NORM2 = Infinity MYNORM2 = NaN
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While the above protects the exponent from overflowing, a problem you have with the above solution is the lesser magnitude contributors (values of x) may get excluded during the additions. One has to match the algorithm with the expected input data values.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought NORM2 was provided to overcome the problem being reported, without the need of writing our own NORM2.
The documentation does specify:
"It is recommended that the processor compute the result without undue overflow or underflow."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That would be my understanding (but not observation) too, with the additional requirement of using the highest precision sqrt method.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page