Solved: Re: ifx 2023.2.0: integer lowering leads to strange bugs

foxtran · ‎12-25-2023

Merry Christmas everyone!

In some codebase, these are a lot of integer-kind type transformations between integer(2), integer(4) and integer(8). So, the code like:

integer(8) :: ARR(3,N)
integer(2) :: ii, jj, kk
do i = 1,N
  ii = ARR(1,i)
  jj = ARR(2,j)
  kk = ARR(3,j)

can be found. Here, integer(8) is transformed to integer(2).

Sometimes, there is type-expansion (integer(4) -> integer(8), compiled with `-i8` flag, ILP64 mode of MKL is used):

integer*4 err
integer n,erri
real*8 mat(n,n),scr(*)
call dsyev('V','L',n,mat,n,scr,scr(n+1),3*n**2,err)
erri=err

Here, integer(4) type is passed instead of integer(8).

With ifort and GCC, all there examples works perfectly.

Unfortunately, the second example, with some modifications generates runtime error at line 5 since it cannot properly read address of err.

In real application, I have the following diff to fix this issue:

@@ -2041,7 +2041,7 @@ C
 ************************************************************************
       implicit none
       integer n,iout,i,j,erri,iflag,ii,info8
-      integer*4 err,info4
+      integer info4
       real*8 mat(n,n),scr(*),ss,tol,mr
       equivalence(info8,info4)
       integer imem,imem1,maxcor,memfree,memreq
@@ -2146,11 +2144,10 @@ c        memreq = idnint(mr)
 c       write(6,*) (mat(i,i),i=1,n)
 c       write(6,"(9f10.6)") mat
 C Diagonalize
-        call dsyev('V','L',n,mat,n,scr,scr(n+1),3*n**2,err)
+        call dsyev('V','L',n,mat,n,scr,scr(n+1),3*n**2,erri)
 c       write(6,*) 'eig'
 c       write(6,"(10000f20.12)") (scr(j),j=1,n)
-        erri=err
-         if(err.ne.0) then
+         if(erri.ne.0) then
           write(iout,*) 'Inverse square root: Fatal error at the ' //
      $'diagonalization of the matrix!'

Here, I just removed err (integer*4) and replace it with erri. Then, it started to work (again, the code is compiled with `-i8`).

So, can one validate lowering of integer types between Intel Fortran front-end and LLVM middle-end to make it consistent with old compiler?

Unfortunately, I could not produce a small reproducible example

I used ifx 2023.2.0

mecej4 · ‎12-25-2023

Here is what is going wrong.

The -i8 compiler option changes the default integer to 8-byte. This option does not change (promote or demote) integer variables declared with an explicit size (such as integer*2). Thus, when you call a Lapack routine such as dsyev with an implicit interface, you may be passing some integer arguments as 8 byte integers, some with 4 byte integers, etc. The MKL ILP-64 routines, however, expect all integer arguments to be 8-byte integers.

To solve the problem, you have these options for using MKL-ILP64:

all integer arguments being passed to MKL routines are declared as default integers, if you wish to use -i8
all integer arguments to MKL routines are declared explicitly as 8-byte integers, whether or not you use -i8

When your code has errors of this nature, the resulting behavior is compiler-dependent. It may, occasionally, work despite the errors and give you the expected results. You should not conclude that one compiler is right and the other is wrong. Any request to change a compiler to make it behave similarly to another compiler when given incorrect code is a request that will probably be ignored.

Here is a short program that illustrates the points that I made.

program buggy
   integer i
   integer*2 i2
   integer*4 i4
   integer*8 i8
   i = 32767
   i2 = i
   i4 = i2
   i8 = i4
   call sub(i,i2,i4,i8)
   print *,i,i2,i4,i8
end program

subroutine sub(i1, i2, i3, i4)
  i1 = 2*i1
  i2 = 2*i2
  i3 = 2*i3
  i4 = 2*i4
  return
end subroutine

Compare these results:

ifort	65534	-2	65534	65534
ifort -i8	65534	-2	65534	65534
ifx	65534	32767	65534	65534
ifx -i8	65534	32767	65534	65534

View solution in original post

mecej4 · ‎12-25-2023

Here is what is going wrong.

The -i8 compiler option changes the default integer to 8-byte. This option does not change (promote or demote) integer variables declared with an explicit size (such as integer*2). Thus, when you call a Lapack routine such as dsyev with an implicit interface, you may be passing some integer arguments as 8 byte integers, some with 4 byte integers, etc. The MKL ILP-64 routines, however, expect all integer arguments to be 8-byte integers.

To solve the problem, you have these options for using MKL-ILP64:

all integer arguments being passed to MKL routines are declared as default integers, if you wish to use -i8
all integer arguments to MKL routines are declared explicitly as 8-byte integers, whether or not you use -i8

When your code has errors of this nature, the resulting behavior is compiler-dependent. It may, occasionally, work despite the errors and give you the expected results. You should not conclude that one compiler is right and the other is wrong. Any request to change a compiler to make it behave similarly to another compiler when given incorrect code is a request that will probably be ignored.

Here is a short program that illustrates the points that I made.

program buggy
   integer i
   integer*2 i2
   integer*4 i4
   integer*8 i8
   i = 32767
   i2 = i
   i4 = i2
   i8 = i4
   call sub(i,i2,i4,i8)
   print *,i,i2,i4,i8
end program

subroutine sub(i1, i2, i3, i4)
  i1 = 2*i1
  i2 = 2*i2
  i3 = 2*i3
  i4 = 2*i4
  return
end subroutine

Compare these results:

ifort	65534	-2	65534	65534
ifort -i8	65534	-2	65534	65534
ifx	65534	32767	65534	65534
ifx -i8	65534	32767	65534	65534

JohnNichols · ‎12-26-2023

program buggy
    implicit none
   integer*2 i
   integer*2 i2
   integer*4 i4
   integer*8 i8
   i = 32767
   i2 = i
   i4 = i2
   i8 = i4
   call sub(i,i2,i4,i8)
   print *,3,i,i2,i4,i8
end program

subroutine sub(i1, i2, i3, i4)
implicit none

integer*2 i1
integer*2 i2
integer*4 i3
integer*8 i4

   print *,1,i1,i2,i3,i4
  i1 = 2*i1
  i2 = 2*i2  
   print *,2,i1,i2,i3,i4
  i3 = 2*i3
  i4 = 2*i4
  return
end subroutine

A slightly different buggy, but in the end the reality is to always consider the Set of actual integers that are in i2, i4, i8 when doing the math, the complier does not care if you make a mistake, it does not make a mistake, it follows the rules built in, do not assume as @mecej4 shows that the rules are the same from EXCEL to R to any compiler. They are not. Do not assume that the people who coded the stuff actually care about all of the real math rules. Of course, if we were still in the age of 640k, one could consider using i2 instead of i8, but now no.

JohnNichols · ‎12-26-2023

@mecej4

Setting aside all the Fortran issues, is the -2 an artifact of the strange symmetry of the integer number line, where there are 1 and -1 etc, but only one - "zero." The binary numbers are not split evenly, so you are mashing a permanently odd numbered set into an even space.

I could be wrong, but I know you will know.

John

mecej4 · ‎12-26-2023

In binary integer arithmetic, the convention is to fuse +0 and -0 to '0'. In floating point arithmetic, however, IEEE-754 mandates a distinction between +0.0 and -0.0. See this Wikipedia article.

When you set y = 1.0/x, with x and y real, if you don't distinguish between x = +0.0 and x = -0.0, you are confronted with having to accept that +∞ and -∞ should be the same.

JohnNichols · ‎12-27-2023

Using a short example

0	000	0	1
0	001	1	2
0	010	2	3
0	011	3	4
0	100	4	5
0	101	5	6
0	110	6	7
0	111	7	8
1	000	0	9
1	001	-1	10
1	010	-2	11
1	011	-3	12
1	100	-4	13
1	101	-5	14
1	110	-6	15
1	111	-7	16

The ninth number row is your wonky zero.

Stolen from the Wikipedia IEEE 754 standard site, 31 here is the problem bit

There is nothing wrong with buggy 1 or buggy 1.0001 -- the -2 should as far as I can see according to the IEEE rules signal an overflow, imagine if this is a rocketry program and someone made this mistake, we do not want -2, it tells us nothing other than the number 9 exists in binary but not in reality, and we want to tell the programmer -- you have a mistake, -2 does not signal mistake.

I have not read the full standard, but the Wikipedia IEEE 754 standard site, would seem to indicate that overflow is the correct and required answer.

Your thoughts appreciated to show me the error of my thoughts.

JohnNichols · ‎01-02-2024

Thinking about this little matter, the original buggy gives two warnings on compile as follows

If we consider the set of I2 , I4 and I8, then it is not closed under addition or multiplication. So particularly the I8 conversion to an I4 implicitly should trigger an error, the chances that one is outside the set of numbers in I2, I4, I8 that can be added or multiplied is in the multiplied sense only 1/n for each element of the set. It is a reasonable risk for I8, perhaps for I4, but not for I2.

Example results from buggy:

The compiler should be getting passed the stage of assuming that the programmer even notices these things in a hectic day.

I realize we should use implicit none, but there is a lot of old code still in use and not all have the skills of this august group, myself excluded of course.

foxtran · ‎01-03-2024

Happy New Year everyone!

After some experiments, I noticed that old ifort and GCC allocates some additional space on stack before memory call so this problem does not arise. At the same time, modern LLVM assumes that all routines are called properly and, therefore, it does not allocate extra stack memory. As a result, improper last argument in dsyev call leads to stack corruption when code was compiled with ifx, but not with ifort.

@mecej4 and @JohnNichols, you are right that the code is wrong.

@mecej4, note, in your example, all variables are passed via registers, while my problem arises when arguments are passing via stack.

@Barbara_P_Intel , is it possible to adjust stack allocation in ifx for such cases to avoid stack corruption?

Barbara_P_Intel · ‎01-03-2024

>> is it possible to adjust stack allocation in ifx for such cases to avoid stack corruption?

There's a couple of solutions to try. One is to put the arrays on the heap. The other is to increase the stack size. See this reference in the Fortran DGR (Developer Guide and Reference). Both solutions are in that reference.

ifx 2023.2.0: integer lowering leads to strange bugs

Compile Error

Runtime error

0	000	0	1
0	001	1	2
0	010	2	3
0	011	3	4
0	100	4	5
0	101	5	6
0	110	6	7
0	111	7	8
1	000	0	9
1	001	-1	10
1	010	-2	11
1	011	-3	12
1	100	-4	13
1	101	-5	14
1	110	-6	15
1	111	-7	16

0	000	0	1
0	001	1	2
0	010	2	3
0	011	3	4
0	100	4	5
0	101	5	6
0	110	6	7
0	111	7	8
1	000	0	9
1	001	-1	10
1	010	-2	11
1	011	-3	12
1	100	-4	13
1	101	-5	14
1	110	-6	15
1	111	-7	16

0	000	0	1
0	001	1	2
0	010	2	3
0	011	3	4
0	100	4	5
0	101	5	6
0	110	6	7
0	111	7	8
1	000	0	9
1	001	-1	10
1	010	-2	11
1	011	-3	12
1	100	-4	13
1	101	-5	14
1	110	-6	15
1	111	-7	16