Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Heap Corruption on Deallocate

David2
Beginner
2,338 Views

 

I have a complicated model - to convoluted to post here - and I am trying to debug some strange behavior. I occasionally get heap corruption errors at the end of the model run when it is deallocating memory. 

I have created a simple test program hello.f90 which reproduces some of the really strange behavior that I see in our application.

Any time the bounds of an array are exceeded - shame on me - it is the programmer's fault, but this behavior makes it really hard to figure out! Depending on the index that is exceeded - first or second of a 2D array - the program will raise a heap corruption error on deallocation or proceed without error when compiled with optimization.

Obviously the thing to do is compile with check all and get a trace back to the line where the error occurs - but this still seems like bad behavior. I am really curious if someone can at least explain what is happening here!

Here is a gist of the code: https://gist.github.com/dstuebe/7936460

The exception raised by the OS is in the comments along with the version of Visual Studio/Fortran.

Thanks 

David

0 Kudos
17 Replies
Steven_L_Intel1
Employee
2,338 Views

What exactly do you expect to happen if you write outside the bounds of an allocated array? This is called "memory corruption" and what you're seeing is that you have corrupted the memory allocator's free/used information. Fix your code. And if you built with bounds checking enabled, you'd get an error at the point of the problem.

0 Kudos
David2
Beginner
2,338 Views

I guess I would expect my bad programming to result in the same behavior regardless of the index I screwed up. Can you explain why the behavior is different depending on which index is exceeded? That seems at least academically interesting. 

The real model code that I am trying to debug has some serious issues where arrays are allocated and indexed based on unvalidated user input and since the error does not show up until the deallocation is called - sometime 100's of wall time hours in... it is hard to debug. Is there any way to tell at least which array caused the error for an exe compiled with optimization (without bounds checking)? Say from a minidump file?

0 Kudos
David2
Beginner
2,338 Views

Also posted to Stack Overflow

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,338 Views

The code that uses un-validated user input should have validation added. This would be lighter weight than bounds checking everything.

You can also Right-Click on individual source files in the Solution Explorer of VS, select Properties, then enable bounds checking on individual files. By picking the right candidate files you may be able to put the overhead below 10%.

Another approach is to add HeapValidate function (Windows C Runtime Library function) call at start and end of subroutines and functions.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366708(v=vs.85).aspx

Once you locate the subroutine/function you can then enable bounds checking for that file (assuming something wasn't noticed after failure).

I would set these in as a macro and use the FPP to control if the call is made or a NOOP performed.

Jim Dempsey

0 Kudos
David2
Beginner
2,338 Views

Hi Jim

Adding validation is underway. I am still struggling to find some of the potential issues though. Is there any way to get the name of the array which is corrupted - either by connecting a runtime debugger or a minidump file once an exception has been hit in the runtime?

All I get right now is something like: Unhandled exception at 0x775de753 (ntdll.dll) in hello.exe: 0xC0000374: A heap has been corrupted.

David

 

 

0 Kudos
IanH
Honored Contributor II
2,338 Views

You ask "why the subscript order matters" in terms of the impact of this programming error.  If you look at the equivalent one-dimensional indices or offsets into the array for the two cases (i.e the relative address) you will see that the (i, j) form of reference (where j is the index going out of bounds by one) results in a zone of damage that covers the next three elements worth of memory that sit beyond the actual array.  The (j,i) case only damages one element worth beyond the end of the array (and two elements inside the array).

The more area you damage outside the array proper - the more likely you are to see problems such as heap corruption.

[plain]

    address:    1   2   3   4   5   6   7   8   9  10  11  12  13
array:     11  21  31  12  22  32  13  23  33  x   x   x   x

    (j,i):                 41          42          43
(i,j):                                         14  24  34[/plain]

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,338 Views

#define CHECK_INDEX1D(A,I) if((I.lt.LBOUND(A)).or.(I.gt.UBOUND(A)) call YourTrap(__FILE__,__LINE)
#define CHECK_BOUNDS1D(A,I,J) CHECK_INDEX1D(A,I); CHECK_INDEX1D(A,J)

 

Similar thing for 2D, 3D, ... arrays

subroutine YourTrap(F,L)
character*(*) :: F
integer :: L
! **************** place break point here
write(*,*) "Subscript error. File: ",F," Line: ",L
! return to caller to debug
end subroutine YourTrap

Then sprinkle the

CHECK_BOUNDS1D(YourArray, iBegin, iEnd)
DO I=iBegin,iEnd

In release build, you can define the macro expansion as blank

Jim Dempsey

0 Kudos
David2
Beginner
2,338 Views

Hi Jim

Thanks for the sweet little bounds check! That is really nice and compact and makes it simple to put inside an ifdef for a release build as you point out.

Thanks - David

0 Kudos
Bernard
Valued Contributor I
2,338 Views

Hi David

Can you post minidump file?

In order to enable catching heap corruption exception immediately please use Gflags.exe tool.IIRC when page heap option is not enabled Windows will not immediately raise heap corruption exception.

Web links:http://msdn.microsoft.com/en-us/library/ff543097.aspx

http://msdn.microsoft.com/en-us/library/windows/hardware/ff542941(v=vs.85).aspx

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,338 Views

No tool since Numega's Bounds Checker of the 1990's can immediately check for out of bounds references (without adversely affecting performance). And that had restrictions. BTW this was before they got bought, pillaged and trashed.

The DebugHeap will detect heap corruption on malloc/free and explicit calls to HeapValidate. It will not detect when you blast a hole into data outside the range of the array (unless this is outside the mapped area of the VM).

One of my Fortran projects (solutions) has over 750 files. It is a simulation program and when fully optimized can run 100's of hours (and some times much more). Running in Debug Build with full bounds checking is near impossible (I cannot live that long). The selective use of bounds checking is the only tenable solution (at least for me). The files that you are absolutely certain that have no bounds errors can be compiled with full optimizations. The remaining file can be compiled with bounds checking enabled. At some point in a test run (assuming error not found), you may be able to ascertain that some of the files with bounds checking has checked all permutations of the test data. Then these can be removed from the suspect list. And you compile, and re-run the test. This is a long process.

Jim Dempsey

0 Kudos
Bernard
Valued Contributor I
2,338 Views

Of course exception is thrown when the user mode code overwrites the specific memory pattern (it has probably  this value 0xbadab).Now it will be discovered when the page heap is scanned and its overwritten magic value is compared with the original value.

Afaik one of the heap managing functions like RtlFreeHeap which is called by free() function will throw an exception when the heap is overwritten or corrupted. 

0 Kudos
David2
Beginner
2,338 Views

Hi Ilya, Jim

Your responses are tremendously helpful. Thank you very much for your time. I have played with GFlags and windbg/ntsd a bit today, but I am having trouble getting it to load symbols properly. Symbols work if i run my exe compiled in debug mode inside ntsd, but I can't seem to get an optimized release mode exe to load symbols for debugging? Is there something I am doing wrong in my project properties?

Otherwise, between the selective use of compile time bounds checking and the gflags/ntsd options I think I have the tools to solve the problem. 

Thanks - David

0 Kudos
David2
Beginner
2,338 Views

A dump file and my compiled hello.exe are in that attached zip. Let me know if you have better luck loading the symbols than I did.

0 Kudos
Bernard
Valued Contributor I
2,338 Views

David,

are you trying to load your private symbols or do you have a problem with Microsoft public symbols?

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,338 Views

There are two things you need to check

1) Make sure the compiler option to produce debug symbols is enabled
2) Make sure the linker option to strip debug symbols is disabled

Note, a fully optimize program is somewhat hard to debug. In some cases it helps to have

subroutine InnocuousOutOfLineRoutine()
end subroutine InnocuousOutOfLineRoutine

Compile as a separate file to .obj without optimizations.
Keep the .obj, delete the source (IPO may find it)

Include the InnocuousOutOfLineRoutine.obj in the project
Sprinkle the calls to the subroutine in places that are difficult to debug.

The line number of the CALL InnocuousOutOfLineRoutine() should be in sync with the debug session.

If you do not remove the source to InnocuousOutOfLineRoutine, and have IPO enabled, it then may inline the subroutine and then remove it.

Jim Dempsey

0 Kudos
Bernard
Valued Contributor I
2,338 Views

Hi David

I loaded provided by you minidump file and I did not have any problems with loading Microsoft public symbols.I quickly looked at call stack dump and I saw that heap corruption was signaled/detected probably by  RtlpCoalesceFreeBlocks+0x84c function operating from kernel mode.This function was called by Win API HeapFree().Tomorrow I will post more detailed analysis.By the way I do not use ntsd.You can do everything from withing Gui version of windbg.

In case that you did not properly set your symbol path,below I pasted the proper version of the symbol path

SRV*c:\symbols*http://msdl.microsoft.com/download/symbols*

0 Kudos
Bernard
Valued Contributor I
2,338 Views

>>>SRV*c:\symbols*http://msdl.microsoft.com/download/symbols*>>>

Of course instead of c:\symbols create your own directory.

0 Kudos
Reply