strange correctness bug. Compiler issues?

Izaak_Beekman · ‎09-21-2011

I was checking the code I had posted the other day for memory leaks and ran into a very troubling correctness bug. Either something very subtle is happening or there is a compiler bug. Compiling the code with a different compiler also produces unexpected results, BUT they are different than the results given under ifort. In one configuration (where I do not test for memory leaks) the code gives the correct results.

[fortran]MODULE gol
  IMPLICIT NONE
  INTEGER, PARAMETER :: fl_rd_err = 50, no_handles = 49, bad_cmd_arg = 48, auto_obj_err = 47
  INTEGER, PARAMETER :: mx_arg_l = 1024 ! Arbitrary but no pre-F2003 mechanism 
                                        ! to dynamically ALLOCATE mem for str

CONTAINS
  ! Read two integers from a text file, line by line
  ! is_eof is true once we reach the end, false otherwise
  SUBROUTINE rdln(handle,i,j,is_eof)
    INTEGER, INTENT(in)  :: handle
    INTEGER, INTENT(out) :: i, j
    LOGICAL, INTENT(out) :: is_eof
    INTEGER :: ios

    is_eof = .FALSE.
    READ(handle,*,iostat=ios) i, j
    IF (ios /= 0) is_eof = .TRUE.  ! assume no other errors encountered
  END SUBROUTINE rdln

  ! Find an unused Fortran file unit
  SUBROUTINE getfreehandle(handle)
    INTEGER, INTENT(out) :: handle
    INTEGER, PARAMETER :: start = 1, last = 1024 ! max open file handles on x86_64 RHEL5
    INTEGER :: i
    LOGICAL :: h_exists, h_is_used
    
    handle = -500
    DO i = start, last
       INQUIRE(unit=i, exist=h_exists, opened=h_is_used)
       IF ( h_exists .AND. .NOT. h_is_used) THEN
          handle = i
          EXIT
       END IF
    END DO
    IF (handle == -500) STOP no_handles
  END SUBROUTINE getfreehandle

  ! Read cmd args, print usage statement if error
  SUBROUTINE parse_cmd_args(fl_name,ngen,xlim,ylim)
    CHARACTER(len=mx_arg_l), INTENT(inout) :: fl_name
    INTEGER,                 INTENT(out)   :: ngen, xlim, ylim
    CHARACTER(len=mx_arg_l) :: argn
    INTEGER :: argc

    argc = IARGC() ! IARGC() and GETARG are not part of the standard, but no way
                   ! to handle command line arguments in Fortran before F2003
                   ! These compiler extensions are common and are in most versions of 
                   ! gfortran and Intel's ifort
    CALL GETARG(0,argn)
    IF (argc /= 4) THEN
       WRITE(0,'(a)') 'Usage: '//TRIM(ADJUSTL(argn))//'  &
            &<# of generations>  ' ! 0 *should* be stderr
       STOP bad_cmd_arg
    END IF
    CALL GETARG(1,fl_name)
    CALL GETARG(2,argn)
    READ(argn,*) ngen
    CALL GETARG(3,argn)
    READ(argn,*) xlim
    CALL GETARG(4,argn)
    READ(argn,*) ylim
  END SUBROUTINE parse_cmd_args

  ! Write out current population on the board to a file
  SUBROUTINE wrt_living(pop)
    LOGICAL, DIMENSION(0:,0:) :: pop
    INTEGER :: i, j, imx, jmx

    imx = UBOUND(pop,dim=1)
    jmx = UBOUND(pop,dim=2)
    DO j = 1,jmx-1
       DO i = 1,imx-1
          IF (pop(i,j)) WRITE(*,'(2(i0,1x))') i-1, j-1 ! Fortran starts at 1
       END DO
    END DO
  END SUBROUTINE wrt_living

  SUBROUTINE gol_kernel(ngen,pop)
    INTEGER,              INTENT(in) :: ngen
    LOGICAL, POINTER, DIMENSION(:,:) :: pop
    LOGICAL, POINTER, DIMENSION(:,:) :: old, tmp
    INTEGER(SELECTED_INT_KIND(1)) :: cnt ! small integer (always >= 0 & < 9)
    INTEGER :: imx, jmx, i, j, n
    
    tmp => NULL()
    old => NULL()
    imx = UBOUND(pop,dim=1)
    jmx = UBOUND(pop,dim=2)
    
    ALLOCATE(old(0:imx,0:jmx))    
    imx = imx - 1
    jmx = jmx - 1
    DO n=1,ngen
       tmp => pop
       pop => old
       old => tmp ! can now rewrite pop
       DO j=1,jmx
          DO i=1,imx
             cnt = COUNT(old(i-1:i+1,j-1)) + &
                   COUNT(old(i-1:i+1:2,j)) + &
                   COUNT(old(i-1:i+1,j+1))
             pop(i,j) = (old(i,j) .AND. (cnt == 2)) .OR. (cnt == 3)
          END DO
       END DO
    END DO
    DEALLOCATE(old)
    tmp => NULL()
  END SUBROUTINE gol_kernel
    
END MODULE gol

PROGRAM tst
  USE gol
  IMPLICIT NONE
  CHARACTER(len=mx_arg_l) :: fl_name
  INTEGER :: ngen, imx, jmx, n, i, j, h
  LOGICAL, POINTER :: pop(:,:) => NULL()
  LOGICAL :: is_eof
  
  CALL parse_cmd_args(fl_name,ngen,imx,jmx)
  ALLOCATE(pop(0:imx+1,0:jmx+1))
  pop = .FALSE.
  CALL getfreehandle(h)
  OPEN(unit=h,file=fl_name,status='old',action='read')
  DO ! Read in IC
     CALL rdln(h,i,j,is_eof)
     IF (is_eof) EXIT
     pop(i+1,j+1) = .TRUE. ! Fortran indices start at 1
  END DO
  CLOSE(h)

!!$  CALL gol_kernel(ngen,pop)
  DO n = 1,ngen
     CALL gol_kernel(1,pop)
  END DO

  CALL wrt_living(pop)
  DEALLOCATE(pop)
END PROGRAM tst[/fortran]

Lines 133 - 136 show the two variants. When line 133 is uncommented and 134 - 136 are commented out this is the "production" version of the code. The code as shown above is the memory leak test case of the code. I have switched the state advancement DO loop to be outside of the main kernel. The kernel only performs 1 iteration each time it is called, but it is called ngen times. (vs being called once in the production code wherein it performs ngen interations)

The results between the production code and the test code differ, however. The results given by the production code are correct and the results given by the test code are erroneous. It also appears that the divergence of the results happens after the third iteration, where suddenly the following points are assigned a value of .TRUE. (i.e. living) for both initial conditions. IC 1 has the additional point 249 0.

x y
192 0
193 0
194 0
203 0
204 0
205 0
206 0
207 0
247 0
248 0

System and compiler info are as follows:

[bash]12:02 PM (0) MPI-proj $ uname -a
Linux xxxxxxxxxxx.xxxxxx.xxx.edu 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
12:09 PM (0) MPI-proj $ ifort -V
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1    Build 20090630 Package ID: l_cprof_p_11.1.046
Copyright (C) 1985-2009 Intel Corporation.  All rights reserved.[/bash]

The tests can be run as follows:

[bash]12:10 PM (0) MPI-proj $ ./a.out life.data.2 100 250 250 | sort -n | diff -s - life.data.2.100.250.250.out.sorted  
Files - and life.data.2.100.250.250.out.sorted are identical
12:10 PM (0) MPI-proj $ ./a.out life.data.1 100 250 250 | sort -n | diff -s - life.data.1.100.250.250.out.sorted  
Files - and life.data.1.100.250.250.out.sorted are identical[/bash]

The above example shows correct output. To generate output for comparison with fewer iterations switch to the "production" source form and replace the 2nd argument to a.out with the desired number of iterations and then filter stdout through sort -n and write stdout to a file. The you can run the tests on the "test" version of the source code with fewer iterations. The two input files are attached, as well as the desired output after 100 iterations. life.data.x are the ICs and life.data.x.n.i.j.out.sorted are the correct results after n iterations on a board iXj cells large.

Please let me know if:
a) I am making a subtle error
b) This is a compiler bug
c) If so, is it fixed in a more recent release.

Thank you,
-Z

Steven_L_Intel1 · ‎09-21-2011

Sorry, I am having trouble following this. How do I build and run the program to show "bad" results, and how do I build and run it to show "good" results? Can you express it without the sort and diff commands, as this makes it difficult to identify the problem area.

Izaak_Beekman · ‎09-21-2011

The program simulates Conway's game of life. (cellular automata) The input file is given as the first argument and is a list of i,j coordinates for living cells. The output is written to stdout. The 2nd argument is the number of iterations to run for, and the third and fourth arguments are the number of cells in the i and j direction respectively. (The output and input are in C-like coordinates on [0:imx-1, 0:jmx-1] which I just shift into Fortran like arguments on )

To build the code just invoke ifort on the single source file, no flags etc. needed.

The code as it's listed shows the 'bad' configuration. To switch to the good configuration just un-comment line 133 and comment out or remove 134-136. I'll attach versions labeled working.f90 and broken.f90 to this post, and you can run a diff to verify that this is the only change.

The code advances a 'board' which is a 2-D array of T and F or 1 and 0 according to the rules outlined by Conway's game of life. An initial condition is set, and each iteration the rules are applied to determine which cells on the board are born (3 living neighbors), live (alive with 2 neighbors), or die (any other combo). Cells not on the board are considered dead (e.g. just off the board edges). This is the BC. Essentially the board object has 2^(imx*jmx) possible states and each current state implies 1 and only 1 state after positive n iterations. This mapping from the current state to a future state is deterministic, but behaves in a complicated way and is hard to predict without actually performing the mapping computationally.

I am telling you all of this to say that yes, I can come up with a test case that will not require sorting, but just comparing the expected output to the actual output by hand might make it hard to see where the problem is occurring.

If we neglect whether one output is 'correct' and the other is 'incorrect' the real heart of the matter is: Why is the output of implementation 1, 'working.f90', different from implementation 2, 'broken.f90'? The only change is whether ngen iterations are performed inside the gol_kernel or the gol_kernel performs 1 iteration and is called ngen times. The update rules and boundary conditions are the same. I also know that 'working.f90' produces correct results because I have compared it to other peoples results. One could even plot the points on a grid and verify that the rules are being followed.

Here is how you can compile the codes and run the tests.

Test 1:

[diff]03:12 PM (0) MPI-proj $ ifort -o working working.f90 && ./working life.data.1 3 250 250 | diff -s - life.data.1.3.250.250.out
Files - and life.data.1.3.250.250.out are identical
03:14 PM (0) MPI-proj $ ifort -o broken broken.f90 && ./broken life.data.1 3 250 250 | diff -s - life.data.1.3.250.250.out
1,11d0
< 192 0
< 193 0
< 194 0
< 203 0
< 204 0
< 205 0
< 206 0
< 207 0
< 247 0
< 248 0
< 249 0[/diff]

Test 2:

[diff]03:14 PM (0) MPI-proj $ ifort -o working working.f90 && ./working life.data.2 3 250 250 | diff -s - life.data.2.3.250.250.out
Files - and life.data.2.3.250.250.out are identical
03:16 PM (0) MPI-proj $ ifort -o broken broken.f90 && ./broken life.data.2 3 250 250 | diff -s - life.data.2.3.250.250.out
1,10d0
< 192 0
< 193 0
< 194 0
< 203 0
< 204 0
< 205 0
< 206 0
< 207 0
< 247 0
< 248 0[/diff]

Between the two tests the codes are the same, but the ICs are different. I have only run for 3 iterations because this is where the expected results diverge from the unexpected results. For the first two iterations there is no difference between broken.f90 and working.f90

The difference between the two codes is only lines 133-136 as is seen below:

[diff]133,136c133,136
<   CALL gol_kernel(ngen,pop)
< !!$  DO n = 1,ngen
< !!$     CALL gol_kernel(1,pop)
< !!$  END DO
---
> !!$  CALL gol_kernel(ngen,pop)
>   DO n = 1,ngen
>      CALL gol_kernel(1,pop)
>   END DO[/diff]

Attached are the 6 files needed to run the tests shown above. Please see my original post for compiler version and system information.

Izaak_Beekman · ‎09-27-2011

Please let me know if there is anything else I can do. I have had two more pairs of eyes take a look at this and they both agree that something fishy is going on and cannot find errors in the code. I would love to know if others can reproduce my results on different hardware and with more recent ifort versions. If I am encountering a compiler bug I would love to know if the same issues are present in more recent versions of ifort.

Thank you,
-Z

mecej4 · ‎09-27-2011

You have bugs in your program, broken.f90.

At entry to the DO loop starting line-94, the array old has just been allocated and its contents are undefined. New values are put into this array by the statement at line=103, but not for subscripts i=0, j=0. There is an omission on your part here, since you are going to reference these undefined values later.

Thus, when subroutine gol_kernel returns, some elements of array pop are undefined. When the subroutine is called again, old is pointed to the argument pop, and old is referenced on lines 100-102 with zero as the first and/or second subscripts -- array elements whose values are still undefined, since they were not defined during the previous run through this subroutine.

With undefined variables, program behavior is undefined. Compilers can do anything they find convenient or within their ability, and such undefined behavior is no basis for suspecting compiler error.

Izaak_Beekman · ‎09-27-2011

Oh my, you're right. old's ghost points need initialization to .FALSE. every time I enter the routine. Then when my pointers swap old and pop, they will both always have the value .FALSE. in the ghost cells.

Thank you so much.