- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was checking the code I had posted the other day for memory leaks and ran into a very troubling correctness bug. Either something very subtle is happening or there is a compiler bug. Compiling the code with a different compiler also produces unexpected results, BUT they are different than the results given under ifort. In one configuration (where I do not test for memory leaks) the code gives the correct results.
The results between the production code and the test code differ, however. The results given by the production code are correct and the results given by the test code are erroneous. It also appears that the divergence of the results happens after the third iteration, where suddenly the following points are assigned a value of .TRUE. (i.e. living) for both initial conditions. IC 1 has the additional point 249 0.
x y
192 0
193 0
194 0
203 0
204 0
205 0
206 0
207 0
247 0
248 0
System and compiler info are as follows:
Please let me know if:
a) I am making a subtle error
b) This is a compiler bug
c) If so, is it fixed in a more recent release.
Thank you,
-Z
[fortran]MODULE gol
IMPLICIT NONE
INTEGER, PARAMETER :: fl_rd_err = 50, no_handles = 49, bad_cmd_arg = 48, auto_obj_err = 47
INTEGER, PARAMETER :: mx_arg_l = 1024 ! Arbitrary but no pre-F2003 mechanism
! to dynamically ALLOCATE mem for str
CONTAINS
! Read two integers from a text file, line by line
! is_eof is true once we reach the end, false otherwise
SUBROUTINE rdln(handle,i,j,is_eof)
INTEGER, INTENT(in) :: handle
INTEGER, INTENT(out) :: i, j
LOGICAL, INTENT(out) :: is_eof
INTEGER :: ios
is_eof = .FALSE.
READ(handle,*,iostat=ios) i, j
IF (ios /= 0) is_eof = .TRUE. ! assume no other errors encountered
END SUBROUTINE rdln
! Find an unused Fortran file unit
SUBROUTINE getfreehandle(handle)
INTEGER, INTENT(out) :: handle
INTEGER, PARAMETER :: start = 1, last = 1024 ! max open file handles on x86_64 RHEL5
INTEGER :: i
LOGICAL :: h_exists, h_is_used
handle = -500
DO i = start, last
INQUIRE(unit=i, exist=h_exists, opened=h_is_used)
IF ( h_exists .AND. .NOT. h_is_used) THEN
handle = i
EXIT
END IF
END DO
IF (handle == -500) STOP no_handles
END SUBROUTINE getfreehandle
! Read cmd args, print usage statement if error
SUBROUTINE parse_cmd_args(fl_name,ngen,xlim,ylim)
CHARACTER(len=mx_arg_l), INTENT(inout) :: fl_name
INTEGER, INTENT(out) :: ngen, xlim, ylim
CHARACTER(len=mx_arg_l) :: argn
INTEGER :: argc
argc = IARGC() ! IARGC() and GETARG are not part of the standard, but no way
! to handle command line arguments in Fortran before F2003
! These compiler extensions are common and are in most versions of
! gfortran and Intel's ifort
CALL GETARG(0,argn)
IF (argc /= 4) THEN
WRITE(0,'(a)') 'Usage: '//TRIM(ADJUSTL(argn))//' &
&<# of generations> ' ! 0 *should* be stderr
STOP bad_cmd_arg
END IF
CALL GETARG(1,fl_name)
CALL GETARG(2,argn)
READ(argn,*) ngen
CALL GETARG(3,argn)
READ(argn,*) xlim
CALL GETARG(4,argn)
READ(argn,*) ylim
END SUBROUTINE parse_cmd_args
! Write out current population on the board to a file
SUBROUTINE wrt_living(pop)
LOGICAL, DIMENSION(0:,0:) :: pop
INTEGER :: i, j, imx, jmx
imx = UBOUND(pop,dim=1)
jmx = UBOUND(pop,dim=2)
DO j = 1,jmx-1
DO i = 1,imx-1
IF (pop(i,j)) WRITE(*,'(2(i0,1x))') i-1, j-1 ! Fortran starts at 1
END DO
END DO
END SUBROUTINE wrt_living
SUBROUTINE gol_kernel(ngen,pop)
INTEGER, INTENT(in) :: ngen
LOGICAL, POINTER, DIMENSION(:,:) :: pop
LOGICAL, POINTER, DIMENSION(:,:) :: old, tmp
INTEGER(SELECTED_INT_KIND(1)) :: cnt ! small integer (always >= 0 & < 9)
INTEGER :: imx, jmx, i, j, n
tmp => NULL()
old => NULL()
imx = UBOUND(pop,dim=1)
jmx = UBOUND(pop,dim=2)
ALLOCATE(old(0:imx,0:jmx))
imx = imx - 1
jmx = jmx - 1
DO n=1,ngen
tmp => pop
pop => old
old => tmp ! can now rewrite pop
DO j=1,jmx
DO i=1,imx
cnt = COUNT(old(i-1:i+1,j-1)) + &
COUNT(old(i-1:i+1:2,j)) + &
COUNT(old(i-1:i+1,j+1))
pop(i,j) = (old(i,j) .AND. (cnt == 2)) .OR. (cnt == 3)
END DO
END DO
END DO
DEALLOCATE(old)
tmp => NULL()
END SUBROUTINE gol_kernel
END MODULE gol
PROGRAM tst
USE gol
IMPLICIT NONE
CHARACTER(len=mx_arg_l) :: fl_name
INTEGER :: ngen, imx, jmx, n, i, j, h
LOGICAL, POINTER :: pop(:,:) => NULL()
LOGICAL :: is_eof
CALL parse_cmd_args(fl_name,ngen,imx,jmx)
ALLOCATE(pop(0:imx+1,0:jmx+1))
pop = .FALSE.
CALL getfreehandle(h)
OPEN(unit=h,file=fl_name,status='old',action='read')
DO ! Read in IC
CALL rdln(h,i,j,is_eof)
IF (is_eof) EXIT
pop(i+1,j+1) = .TRUE. ! Fortran indices start at 1
END DO
CLOSE(h)
!!$ CALL gol_kernel(ngen,pop)
DO n = 1,ngen
CALL gol_kernel(1,pop)
END DO
CALL wrt_living(pop)
DEALLOCATE(pop)
END PROGRAM tst[/fortran] Lines 133 - 136 show the two variants. When line 133 is uncommented and 134 - 136 are commented out this is the "production" version of the code. The code as shown above is the memory leak test case of the code. I have switched the state advancement DO loop to be outside of the main kernel. The kernel only performs 1 iteration each time it is called, but it is called ngen times. (vs being called once in the production code wherein it performs ngen interations)The results between the production code and the test code differ, however. The results given by the production code are correct and the results given by the test code are erroneous. It also appears that the divergence of the results happens after the third iteration, where suddenly the following points are assigned a value of .TRUE. (i.e. living) for both initial conditions. IC 1 has the additional point 249 0.
x y
192 0
193 0
194 0
203 0
204 0
205 0
206 0
207 0
247 0
248 0
System and compiler info are as follows:
[bash]12:02 PM (0) MPI-proj $ uname -a Linux xxxxxxxxxxx.xxxxxx.xxx.edu 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux 12:09 PM (0) MPI-proj $ ifort -V Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20090630 Package ID: l_cprof_p_11.1.046 Copyright (C) 1985-2009 Intel Corporation. All rights reserved.[/bash]The tests can be run as follows:
[bash]12:10 PM (0) MPI-proj $ ./a.out life.data.2 100 250 250 | sort -n | diff -s - life.data.2.100.250.250.out.sorted Files - and life.data.2.100.250.250.out.sorted are identical 12:10 PM (0) MPI-proj $ ./a.out life.data.1 100 250 250 | sort -n | diff -s - life.data.1.100.250.250.out.sorted Files - and life.data.1.100.250.250.out.sorted are identical[/bash]The above example shows correct output. To generate output for comparison with fewer iterations switch to the "production" source form and replace the 2nd argument to a.out with the desired number of iterations and then filter stdout through sort -n and write stdout to a file. The you can run the tests on the "test" version of the source code with fewer iterations. The two input files are attached, as well as the desired output after 100 iterations. life.data.x are the ICs and life.data.x.n.i.j.out.sorted are the correct results after n iterations on a board iXj cells large.
Please let me know if:
a) I am making a subtle error
b) This is a compiler bug
c) If so, is it fixed in a more recent release.
Thank you,
-Z
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I am having trouble following this. How do I build and run the program to show "bad" results, and how do I build and run it to show "good" results? Can you express it without the sort and diff commands, as this makes it difficult to identify the problem area.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The program simulates Conway's game of life. (cellular automata) The input file is given as the first argument and is a list of i,j coordinates for living cells. The output is written to stdout. The 2nd argument is the number of iterations to run for, and the third and fourth arguments are the number of cells in the i and j direction respectively. (The output and input are in C-like coordinates on [0:imx-1, 0:jmx-1] which I just shift into Fortran like arguments on )
To build the code just invoke ifort on the single source file, no flags etc. needed.
The code as it's listed shows the 'bad' configuration. To switch to the good configuration just un-comment line 133 and comment out or remove 134-136. I'll attach versions labeled working.f90 and broken.f90 to this post, and you can run a diff to verify that this is the only change.
The code advances a 'board' which is a 2-D array of T and F or 1 and 0 according to the rules outlined by Conway's game of life. An initial condition is set, and each iteration the rules are applied to determine which cells on the board are born (3 living neighbors), live (alive with 2 neighbors), or die (any other combo). Cells not on the board are considered dead (e.g. just off the board edges). This is the BC. Essentially the board object has 2^(imx*jmx) possible states and each current state implies 1 and only 1 state after positive n iterations. This mapping from the current state to a future state is deterministic, but behaves in a complicated way and is hard to predict without actually performing the mapping computationally.
I am telling you all of this to say that yes, I can come up with a test case that will not require sorting, but just comparing the expected output to the actual output by hand might make it hard to see where the problem is occurring.
If we neglect whether one output is 'correct' and the other is 'incorrect' the real heart of the matter is: Why is the output of implementation 1, 'working.f90', different from implementation 2, 'broken.f90'? The only change is whether ngen iterations are performed inside the gol_kernel or the gol_kernel performs 1 iteration and is called ngen times. The update rules and boundary conditions are the same. I also know that 'working.f90' produces correct results because I have compared it to other peoples results. One could even plot the points on a grid and verify that the rules are being followed.
Here is how you can compile the codes and run the tests.
Test 1:
The difference between the two codes is only lines 133-136 as is seen below:
To build the code just invoke ifort on the single source file, no flags etc. needed.
The code as it's listed shows the 'bad' configuration. To switch to the good configuration just un-comment line 133 and comment out or remove 134-136. I'll attach versions labeled working.f90 and broken.f90 to this post, and you can run a diff to verify that this is the only change.
The code advances a 'board' which is a 2-D array of T and F or 1 and 0 according to the rules outlined by Conway's game of life. An initial condition is set, and each iteration the rules are applied to determine which cells on the board are born (3 living neighbors), live (alive with 2 neighbors), or die (any other combo). Cells not on the board are considered dead (e.g. just off the board edges). This is the BC. Essentially the board object has 2^(imx*jmx) possible states and each current state implies 1 and only 1 state after positive n iterations. This mapping from the current state to a future state is deterministic, but behaves in a complicated way and is hard to predict without actually performing the mapping computationally.
I am telling you all of this to say that yes, I can come up with a test case that will not require sorting, but just comparing the expected output to the actual output by hand might make it hard to see where the problem is occurring.
If we neglect whether one output is 'correct' and the other is 'incorrect' the real heart of the matter is: Why is the output of implementation 1, 'working.f90', different from implementation 2, 'broken.f90'? The only change is whether ngen iterations are performed inside the gol_kernel or the gol_kernel performs 1 iteration and is called ngen times. The update rules and boundary conditions are the same. I also know that 'working.f90' produces correct results because I have compared it to other peoples results. One could even plot the points on a grid and verify that the rules are being followed.
Here is how you can compile the codes and run the tests.
Test 1:
[diff]03:12 PM (0) MPI-proj $ ifort -o working working.f90 && ./working life.data.1 3 250 250 | diff -s - life.data.1.3.250.250.out Files - and life.data.1.3.250.250.out are identical 03:14 PM (0) MPI-proj $ ifort -o broken broken.f90 && ./broken life.data.1 3 250 250 | diff -s - life.data.1.3.250.250.out 1,11d0 < 192 0 < 193 0 < 194 0 < 203 0 < 204 0 < 205 0 < 206 0 < 207 0 < 247 0 < 248 0 < 249 0[/diff]Test 2:
[diff]03:14 PM (0) MPI-proj $ ifort -o working working.f90 && ./working life.data.2 3 250 250 | diff -s - life.data.2.3.250.250.out Files - and life.data.2.3.250.250.out are identical 03:16 PM (0) MPI-proj $ ifort -o broken broken.f90 && ./broken life.data.2 3 250 250 | diff -s - life.data.2.3.250.250.out 1,10d0 < 192 0 < 193 0 < 194 0 < 203 0 < 204 0 < 205 0 < 206 0 < 207 0 < 247 0 < 248 0[/diff]Between the two tests the codes are the same, but the ICs are different. I have only run for 3 iterations because this is where the expected results diverge from the unexpected results. For the first two iterations there is no difference between broken.f90 and working.f90
The difference between the two codes is only lines 133-136 as is seen below:
[diff]133,136c133,136 < CALL gol_kernel(ngen,pop) < !!$ DO n = 1,ngen < !!$ CALL gol_kernel(1,pop) < !!$ END DO --- > !!$ CALL gol_kernel(ngen,pop) > DO n = 1,ngen > CALL gol_kernel(1,pop) > END DO[/diff]Attached are the 6 files needed to run the tests shown above. Please see my original post for compiler version and system information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please let me know if there is anything else I can do. I have had two more pairs of eyes take a look at this and they both agree that something fishy is going on and cannot find errors in the code. I would love to know if others can reproduce my results on different hardware and with more recent ifort versions. If I am encountering a compiler bug I would love to know if the same issues are present in more recent versions of ifort.
Thank you,
-Z
Thank you,
-Z
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have bugs in your program, broken.f90.
At entry to the DO loop starting line-94, the array old has just been allocated and its contents are undefined. New values are put into this array by the statement at line=103, but not for subscripts i=0, j=0. There is an omission on your part here, since you are going to reference these undefined values later.
Thus, when subroutine gol_kernel returns, some elements of array pop are undefined. When the subroutine is called again, old is pointed to the argument pop, and old is referenced on lines 100-102 with zero as the first and/or second subscripts -- array elements whose values are still undefined, since they were not defined during the previous run through this subroutine.
With undefined variables, program behavior is undefined. Compilers can do anything they find convenient or within their ability, and such undefined behavior is no basis for suspecting compiler error.
At entry to the DO loop starting line-94, the array old has just been allocated and its contents are undefined. New values are put into this array by the statement at line=103, but not for subscripts i=0, j=0. There is an omission on your part here, since you are going to reference these undefined values later.
Thus, when subroutine gol_kernel returns, some elements of array pop are undefined. When the subroutine is called again, old is pointed to the argument pop, and old is referenced on lines 100-102 with zero as the first and/or second subscripts -- array elements whose values are still undefined, since they were not defined during the previous run through this subroutine.
With undefined variables, program behavior is undefined. Compilers can do anything they find convenient or within their ability, and such undefined behavior is no basis for suspecting compiler error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh my, you're right. old's ghost points need initialization to .FALSE. every time I enter the routine. Then when my pointers swap old and pop, they will both always have the value .FALSE. in the ghost cells.
Thank you so much.
Thank you so much.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page