Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Heap block error

Les_Neilson
Valued Contributor II
2,364 Views

Weuse array reallocation a lot in our code, as the data models grow. In this particular instance the model is big (though not as big as some I have seen) andone particular program runs for some 5+ hours before it dies with the following error :

[cpp]HEAP[viewer.exe]: Heap block at 05B24F48 modified at 05B252C4 past requested size of 374

Windows has triggered a breakpoint in viewer.exe.

This may be due to a corruption of the heap, and indicates a bug in viewer.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information

HEAP[viewer.exe]: Invalid address specified to RtlFreeHeap( 03A20000, 05B24F50 )
Windows has triggered a breakpoint in viewer.exe.[/cpp]

The code in question below shows the arrays are initially allocatedsize 200 and when reallocated are increased by 20

[cpp]      module M_viewer
      implicit none

      !------------------------------------------------
      !         Module for viewer
      !------------------------------------------------

      integer*4 :: G_ncurrent = 200
      integer*4 :: increment  = 20

      ! Allocatables
      integer*4 ncurrent
      integer*4, allocatable :: scurrent(:)
      real*8,    allocatable :: tcurrent(:)

      contains
!------------------------------------------------------------
      subroutine AllocateViewer(ierr)
      implicit none

      integer*4, intent(out) :: ierr

      allocate(scurrent(G_ncurrent+1),stat = ierr)
      if (ierr /= 0) return
      allocate(tcurrent(G_ncurrent),stat = ierr)
      if (ierr /= 0) return

      scurrent = 0
      tcurrent = 0.0d0

      return
      end subroutine AllocateViewer
!----------------------------------------------------------------------|
      subroutine ReAllocateScurrent(ierr)
      implicit none

      integer*4, intent(out) :: ierr

      ! Local allocatables
      real*8,    allocatable :: r8temp(:)
      integer*4, allocatable :: i4temp(:)

      integer*4 :: oldsize, newsize

      ierr = 0
      !-------------------------|

      oldsize = G_ncurrent
      newsize = oldsize+increment

      allocate(i4temp(newsize+1), stat=ierr)
      if (ierr == 0) then
         i4temp(1:oldsize+1) = scurrent
         call move_alloc(i4temp, scurrent)
      endif

      if (ierr == 0) then
         allocate(r8temp(newsize), stat=ierr)
         if (ierr == 0) then
            r8temp(1:oldsize) = tcurrent
            call move_alloc(r8temp, tcurrent)
         endif
      endif 

      if (ierr == 0) then
         G_ncurrent = newsize
      endif 

      end subroutine ReAllocateScurrent

[/cpp]

when the program dies the debug shows that oldsize is 200, newsize is 220 and the ierr return status from the allocate is zero. The failure appears to be in the move_alloc because of thereference to RtlFreeHeap(or might possibly be the assignmentof the old values to the new array) as though it were going past the end of the array. But I wonder if it could be a memory fragmentation problem (but then why is allocateierr==0?) though I am not ruling out Heisenbugs.
This occurs with IVF v9.1.028 I will be trying v10.1.025 later.

Has anyone experienced anything similar ?

Les

0 Kudos
14 Replies
Jugoslav_Dujic
Valued Contributor II
2,364 Views
Quoting - Les Neilson

Weuse array reallocation a lot in our code, as the data models grow. In this particular instance the model is big (though not as big as some I have seen) andone particular program runs for some 5+ hours before it dies with the following error :

[cpp]HEAP[viewer.exe]: Heap block at 05B24F48 modified at 05B252C4 past requested size of 374

Windows has triggered a breakpoint in viewer.exe.

This may be due to a corruption of the heap, and indicates a bug in viewer.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information

HEAP[viewer.exe]: Invalid address specified to RtlFreeHeap( 03A20000, 05B24F50 )
Windows has triggered a breakpoint in viewer.exe.[/cpp]

Has anyone experienced anything similar ?

Les

In my experience,

[cpp]Heap block at 05B24F48 modified at 05B252C4 past requested size of 374[/cpp]

is a sign of write-out-of-bounds problem somewhere, rather than a memory manager problem (which I, IIRC, haven't encountered in any flavor of Visual Fortran since DVF5.0). The error shows up when debug version of C run-time library (RtlFreeHeap) takes a look at guard block before and/or after the array being deallocated, and finds it overwritten.

So, you have heap corruption somewhere, but (assuming that you do have at least /check:bounds on) I suppose the conditions are such that the bounds checking misses it (writing past assumed size argument?). Good luck in searching for the cause -- you'll need it.

0 Kudos
Les_Neilson
Valued Contributor II
2,364 Views
Quoting - Jugoslav Dujic

In my experience,

Heap block at 05B24F48 modified at 05B252C4 past requested size of 374

is a sign of write-out-of-bounds problem somewhere, rather than a memory manager problem (which I, IIRC, haven't encountered in any flavor of Visual Fortran since DVF5.0). The error shows up when debug version of C run-time library (RtlFreeHeap) takes a look at guard block before and/or after the array being deallocated, and finds it overwritten.

So, you have heap corruption somewhere, but (assuming that you do have at least /check:bounds on) I suppose the conditions are such that the bounds checking misses it (writing past assumed size argument?). Good luck in searching for the cause -- you'll need it.

Thanks Jugoslav,I suspected as much, but hoped it was not. We do have check:bounds on and checkuninitialized variables, so I guess I shall just have to dig a little deeper.

Les

0 Kudos
Jugoslav_Dujic
Valued Contributor II
2,364 Views
Quoting - Les Neilson

Thanks Jugoslav,I suspected as much, but hoped it was not. We do have check:bounds on and checkuninitialized variables, so I guess I shall just have to dig a little deeper.

374 is hex, evaluating to 884 dec. When divided by 4, it gives 221. That implies that the corruption happened after your 220-long integer(4) array.

Note that "data breakpoints" sort of work in Intel Fortran, i.e. you can watch a single memory address (Debug/New Breakpoint/New data breakpoint/Set language to C; I'm not sure if it works in 9.1). If you're lucky, you can set a breakpoint at 0x05B252C4 and see when it gets modified.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,364 Views

Les,

Prior to hunting down your heap corruption ask youself:

Is your application multi-threaded?

If so, is your reallocation thread safe?

Jim Dempsey

0 Kudos
Les_Neilson
Valued Contributor II
2,364 Views

Les,

Prior to hunting down your heap corruption ask youself:

Is your application multi-threaded?

If so, is your reallocation thread safe?

Jim Dempsey

At the moment it is single threaded. The original code goes way back (started on Apollo hardware before migrating to pc), I and another colleague (no longer here)added the module stuff and mademany of the arrays re-allocatable in the way shown above.

Thinking about what Jugoslav said, I see that the integer array scurrent is always 1 bigger than tcurrent (201 vs 200 initially)so it does help narrow down where to look. I had put conditional break points just after all the calls to allocate (in other modules too not just this one), to catch if ierr was not zero but the debug showed it was zero when the heap error occured.
5 or more hours is a long time to wait for the program to die so I'm taking to running it overnight and hope we don't get a power cut.

Les

0 Kudos
Jugoslav_Dujic
Valued Contributor II
2,364 Views
Quoting - Les Neilson

At the moment it is single threaded. The original code goes way back (started on Apollo hardware before migrating to pc), I and another colleague (no longer here)added the module stuff and mademany of the arrays re-allocatable in the way shown above.

Thinking about what Jugoslav said, I see that the integer array scurrent is always 1 bigger than tcurrent (201 vs 200 initially)so it does help narrow down where to look. I had put conditional break points just after all the calls to allocate (in other modules too not just this one), to catch if ierr was not zero but the debug showed it was zero when the heap error occured.
5 or more hours is a long time to wait for the program to die so I'm taking to running it overnight and hope we don't get a power cut.

Les

A couple hints more:

(0x05B252C4 - 0x05B24F48)/4 = 223.

It appears that you have silent overwrite past the end of array. I can replicate the behavior with the following single program (which doesn't show an out-of-bounds access, but a heap corruption at deallocate). Also note that the "requested size" is 4 bytes bigger than the actual array size, i.e. allocation of 220 elements gives "requested size" of 0x374:

[cpp]program heap

integer, allocatable:: foo(:)

allocate(foo(220))
foo = 42
call badheap(foo)
deallocate(foo)

end program heap
!---------------------
subroutine badheap(foo)
integer foo(*)

foo(223) = Z'BAADF00D'

end subroutine badheap
[/cpp]
So, if you can find out which array caused the problem (the one that failed to deallocate), pay attention to places where it's passed to a subroutine as an assumed-size intent(inout). There lie your dragons.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,364 Views

Les,

Extending Jugoslav's information into a diagnostic try the following

Your allocation/deallocation/reallocation routines are isolated therefore easy to change.

First check your code to see if you are using size(array) for either of the two arrays. You coule be obtaining the size of the +1 array and then storing into the +0 array at the size index of the +1 array.

If not using size(array) change the allocation routine to allocate an additional cell for each array and then insert a signature value into the extra cell. Use a value that you would not expect in your dataset and would not by chance also equal the contents of the heap signature bytes (you can compare the prior contents to your signature as a first step, if equal, change signature).

Now place into your code a debug compiled test of the signatures. Good candidates are
1) immediately before the realloc
2) immediately before the delete
3) in the outer layers of your programming onion insert tests before and after subroutine calls
4) work your way deeper into the onion

Make your test a subroutine call such that you have but one place to insert a break point.

For Intel eyes:

Wouldn't it be a nice feature enhancement for developers if you add a diagnostic call

call BreakOnMemoryChange(VariableReferenceHere, code)

Where code can be Enable, Disable, number of bytes, etc...

That is to say a function whereby the programmer can set and reset hardware breakpoints.

Jim Dempsey

0 Kudos
Les_Neilson
Valued Contributor II
2,364 Views
Well, thanks to Jugoslav and Jim, I think I'vefound thedragon. (One for definite, others may be lurking elsewhere!)
I searched the code tree for all occurences of the problemarray which fortunately turned out to be quite a small subset. After staring at the code for some time I suddenly saw the problem. The code, being old, predates whole array operations and uses subroutines to copy the contents of one array to another (one for integer, real, etc) These are work arrays and the code copies "current" to "temp",inserts a new value at current(1)and copies temp back to current(2:).In the routine I was looking at there is a multiple if-then-else block, and one of the conditions just happened tocopy N+1 elements ofthe temparray to scurrent(2) when N+1 is the size of scurrent, thus going past the end of the array.
I added a test to see if the arrays would need to be reallocated before doing all the insert/append stuff.
Thanks again
Les
0 Kudos
evankw21
Beginner
2,364 Views
Quoting - Les Neilson
Well, thanks to Jugoslav and Jim, I think I'vefound thedragon. (One for definite, others may be lurking elsewhere!)
I searched the code tree for all occurences of the problemarray which fortunately turned out to be quite a small subset. After staring at the code for some time I suddenly saw the problem. The code, being old, predates whole array operations and uses subroutines to copy the contents of one array to another (one for integer, real, etc) These are work arrays and the code copies "current" to "temp",inserts a new value at current(1)and copies temp back to current(2:).In the routine I was looking at there is a multiple if-then-else block, and one of the conditions just happened tocopy N+1 elements ofthe temparray to scurrent(2) when N+1 is the size of scurrent, thus going past the end of the array.
I added a test to see if the arrays would need to be reallocated before doing all the insert/append stuff.
Thanks again
Les
I am obtaining the same type of error in a very large program of mine running ifort 11.1.3466.2008 with MSVS 2008 on Windows XP (32-bit). The error occurs as I call an HDF4.2 subroutine that opens an HDF file for writing (sfstart). All the inputs look good, and the same routine has been successfully executed multiple times prior to the error. The program runs fine on most datasets. One time I was able to run the exact same data successfully writing to a local disk, but consistently got an error when writing to a network drive. I am able to debug right up to the problem point and even set a data breakpoint. Nothing gives me any clue. The errors come out in ntdll.dll and in msvcr80.dll.

In order to see if one of my allocatable arrays was being overwritten (as was apparently the issue in this post), I made a subroutine that deallocates every allocated array in the whole program. Whether I execute this at the end of the program or just before the error, it executes fine. I am at wit's end and at a dead end. My only thought is to try to compile under Linux and see if that implementation gives me any more information.

I gather these types of errors are pernicious. I've had similarly bizarre errors in the past and have always worked through them somehow, but this one has me at a standstill. Any suggestions would be greatly appreciated.
0 Kudos
evankw21
Beginner
2,364 Views
Quoting - evankw21
Quoting - Les Neilson
Well, thanks to Jugoslav and Jim, I think I'vefound thedragon. (One for definite, others may be lurking elsewhere!)
I searched the code tree for all occurences of the problemarray which fortunately turned out to be quite a small subset. After staring at the code for some time I suddenly saw the problem. The code, being old, predates whole array operations and uses subroutines to copy the contents of one array to another (one for integer, real, etc) These are work arrays and the code copies "current" to "temp",inserts a new value at current(1)and copies temp back to current(2:).In the routine I was looking at there is a multiple if-then-else block, and one of the conditions just happened tocopy N+1 elements ofthe temparray to scurrent(2) when N+1 is the size of scurrent, thus going past the end of the array.
I added a test to see if the arrays would need to be reallocated before doing all the insert/append stuff.
Thanks again
Les
I am obtaining the same type of error in a very large program of mine running ifort 11.1.3466.2008 with MSVS 2008 on Windows XP (32-bit). The error occurs as I call an HDF4.2 subroutine that opens an HDF file for writing (sfstart). All the inputs look good, and the same routine has been successfully executed multiple times prior to the error. The program runs fine on most datasets. One time I was able to run the exact same data successfully writing to a local disk, but consistently got an error when writing to a network drive. I am able to debug right up to the problem point and even set a data breakpoint. Nothing gives me any clue. The errors come out in ntdll.dll and in msvcr80.dll.

In order to see if one of my allocatable arrays was being overwritten (as was apparently the issue in this post), I made a subroutine that deallocates every allocated array in the whole program. Whether I execute this at the end of the program or just before the error, it executes fine. I am at wit's end and at a dead end. My only thought is to try to compile under Linux and see if that implementation gives me any more information.

I gather these types of errors are pernicious. I've had similarly bizarre errors in the past and have always worked through them somehow, but this one has me at a standstill. Any suggestions would be greatly appreciated.
OK, here's an update that may make it easier to suggest what's going on. The HDF routine sfstart takes the file name as an argument. (The only other argument is a constant that says whether the file is to be read, written, or both.) When the TRIMmed length of the file name is less than 111, the program runs fine. When it's not, the program crashes there. The network drive was not culprit - it just had a longer path than the local drive. But the strange thing is that the three datasets have already been written to the HDF file successfully. Even when the program crashes, if I hit Continue the program continues and generates a valid HDF file that has the right numbers in all four data sets. Also, after hitting Continue, the same routine writes additional data to the same HDF file and the same data set within the file, and no errors occur. The heap errors I'm getting are the same pair of errors quoted in the referenced entry by Les.

Again, any information would be appreciated. Right now my workaround is a shorter file name, and that's not a great solution.
0 Kudos
scrognoid
Beginner
2,364 Views
I'm trying to help evankw21 with his heap block problem. Can we employ in Fortran some of the things that are turning up in Help like "full page heap" and "ntsd". What about _heapchk from C?
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,364 Views

>>When the TRIMmed length of the file name is less than 111, the program runs fine....
>>Right now my workaround is a shorter file name, and that's not a great solution.

As a work around you could use GETDRIVEDIRQQ and CHANGEDIRQQ

If you wish to save the current directory for the drive passed to the HDF sfstart use GETDRIVEDIRQQ to save the current directory on that drive. Then use CHANGEDIRQQ to set the current directory for the drive letter where the file is to reside.

Assume:

D:Realy Long Path Herefoo.hdf

You locate the split between the drive/path and file name and trim off the file name

D:Realy Long Path Here

Then use CHANGEDIRQQ to set the current directorypath for that drive

Next, when you specify the file spec to HDF sfstart, combine the drive letter and the file name (omit the )

D:foo.hdf

If needed, then restore the old "current directory" for the drive.

Note, if path is on network, you can either map a network drive (to the long path) or use a share name to the long path.

Jim Dempsey

0 Kudos
evankw21
Beginner
2,364 Views

Jim:

Thanks for the changedir ideas. That would be a surefire way to assure the path names were short enough.

We still want to make sure it's not a problem on our end of overwriting memory.

Evan

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,364 Views

>>We still want to make sure it's not a problem on our end of overwriting memory.

LENTRIM is your friend (before you concatinate two strings)

0 Kudos
Reply