Checking the real array bounds to detect memory corruption

van_der_merwe__ben · ‎07-11-2007

Compiling a fortran file using the ifort 
option "/check:bounds" works well. However, 
its checks are based on the dimensions of 
the array that you give it.

So its perfectly happy if you pass an 
array into a function and inside the 
function tell it the array is actually 
larger than it is and then write past the 
end of the array.

This kinda corruption can be very hard to
track down if you have a large amount of 
old Fortran code that is still being called.

Ideally there should be a range bounds 
compile option that checks the bounds of the
actual underlying array as originally 
allocated.

Purify tracks down similar problems in 
C++ but crashes on any exe or dll that links 
in Fortran code (very annoying). AQTime can 
handle Fortran code and has an option to track
down this sort of thing, but I have not tried
that option yet.

Any thoughts out there?

jimdempseyatthecove · ‎07-11-2007

Presumably you are tring to track down problem in someone else's code.

From prior posts it was stated that the ALLOCATE and DEALLOCATE eventually end up calling the C runtime library. Therefore, you should be able to load the debug version of the C RunTime library heap manager and then periodicalyuse the heap functions that test the integrety of the heap. Insert more calls as you narrow down the problem.

The above might catch heap corruption problems soon after the memory got stepped on. But, the above will not verify trashing of local data or statically allocated data. For those situations you will have to fall back on the old technique of inserting guard data arround these objects and the checking them for indication of corruption.

User defined types can have conditionaly compiled guard data expanded into the type data. Some arrays can be conditionally compiled with bounds that index below and above whats used and insert the guard data there.

integer, pragma :: ArraySize = 100
#ifdef _DEBUG
real :: Array(0:ArraySize+1)
#else
real :: Array(1:ArraySize)
#endif

...

You can use either the Fortran conditional compilation directives or the Fortran PreProcessor conditional compilation directives as above. I prefer to use the FPP because of it's macro capability. Thus permitting hiding some of the conditional compilation directives.

You also might want to check to see if BoundsChecker would catch the memory problems. See:
http://www.compuware.com/products/devpartner/visualc.htm#error

It says it is for C++ but C++ calls the CRT heap routines. You might need to make a little C shell that calls your Fortran program as a subroutine.

Jim Dempsey

van_der_merwe__ben · ‎07-11-2007

Thank you.

Making the arrays slightly bigger and putting test data at the end using conditional compiles is what I have been using.

Except in my case we are talking about literally hundreds of Fortran files...

The heap is maybe a good idea though in our case there are dozens of DLLs and some share one single heap with overloaded memory allocation code while others have their own default memory heaps.

Some arrays are local, some are static and some are dynamic. Lots of code written by different people over many years.

We tried Boundschecker about six years ago but it kept saying we are trying to check too much code or it crashed. The only thing their support did was to tell us to exclude even more code. It got to the point where you had to exclude so much it became useless. I got annoyed and concluded that it simply cant handle very large applications and didnt renew it again. Maybe we can try them again.

You really need a Purify that can handle Fortran as well. It does not sound like Rational will ever add Fortran support and now that IBM has taken them over, I have no hopes of it happening.

TimP · ‎07-11-2007

In my experience, it's extremely unusual for an application to declare false larger sizes to arrays declared in the caller. If an application is intentionally written so as to defeat bounds checking, it's difficult to believe that new debugging techniques would improve ability to diagnose them. If you say, well, people aren't willing to bring legacy code up to the standards of 15 years ago, I sympathize. However, if code isn't written to the standards of 30 or 40 years ago, corrections should be considered. As the situation is so rare, I don't know whether standard checking utilities such as ftncheck would expose the source code error, but I think it's worth a try, if ifort options -CB don't expose it.

jimdempseyatthecove · ‎07-12-2007

Ben,

It sounds like you are experienced enough to replace the C++ malloc (actually filter malloc) such that allocations from FORTRAN will pass through the substituted malloc. You may want to do this for other than diagnostic purposes such as to introduce an aligned malloc into FORTRAN to align to 128-bits for vectorization. But for debugging purposes the allocation is made such that it encloses the requested memory by sentinels. Then add the requesit integrity test routines.

The debug version of the heap manager does this to some extent. But if you create your own wrapper you can insert things into the wrapper that might aid in finding the problem. e.g. pointer to module name and line number of what allocated and deallocated the block of memory, allocation sequence, debug trap on combination of who, where when so you can break just before the damage.

It is a lot of work. But a dead program puts you in a worse place.

I used to use BoundsChecker back when Numega had it. Great product back then. When it got sold to Compuware the product fell apart. It was like the development team didn't follow with the sale of theproduct. MS was changing Windows at that time so there may have been difficult technological issues to resolve.

If you suspect the problem is in a call to aDLL the error could either be in a) your arguments, b) bad code in DLL, c) bad interpretation of "valid" arguments passed. For c) this could mean as an example expecting room for the NULL at the end of a C string or alignment of data.

You might have debugging success by using FPP to replace all the calls of interest to the DLLs with a call to a wrapper function that creates a sentinaled derive type containing a copy of the data for the DLL. i.e. the real data is copied to the derived type, the field(s) in the derive type are passed to the DLL, on return from DLL sentinals checked, copy data back to the caller, ...

CALL FOO(PACKET, N, STAT)

Becomes

#define FOO(a,b,c) Debug_FOO((a),(b),(c))
...
CALL FOO(PACKET, N, STAT)
...

SUBROUTINE Debug_FOO(PACKET, N, STAT)
! Define type to hold copy of args with sentinals
TYPE ProtectedPACKET
SEQUENCE
INTEGER :: Sentinal1
! Whatever type the PACKET was
TYPE(TypePACKET) :: PACKET
INTEGER :: Sentinal2
INTEGER :: N
INTEGER :: Sentinal3
INTEGER :: STAT
INTEGER :: Sentinal4
END TYPE ProtectedPACKET
! Create a protected type to hold copy
TYPE(ProtectedPACKET) :: LocalPacket
! define something for sentinal
INTEGER, PARAMETER :: Sentinal = 1234554321
! populate the protected copy
LocalPacket.Sentinal1 = Sentinal
LocalPacket.PACKET = PACKET
LocalPacket.Sentinal2 = Sentinal
LocalPacket.N = N
LocalPacket.Sentinal3 = Sentinal
LocalPacket.STAT = STAT
LocalPacket.Sentinal4 = Sentinal
! make call to DLL
! use low case or #undefine FOO
call foo(LocalPacket.PACKET, LocalPacket.N, LocalPacket.STAT )
! check for corruption of sentinals
if(LocalPacket.Sentinal1.ne. Sentinal) call Bug()
if(LocalPacket.Sentinal2.ne. Sentinal) call Bug()
if(LocalPacket.Sentinal3.ne. Sentinal) call Bug()
if(LocalPacket.Sentinal4.ne. Sentinal) call Bug()
! return data
PACKET = LocalPacket.PACKET
N = LocalPacket.N
STAT = LocalPacket.STAT
end SUBROUTINE Debug_FOO

Depending on the problem (in DLL) the w rapper thing might be able to work around the problem (e.g. add padd data if necessary).

Good luck hunting.

If you get stuck I might have time to help you out.

Jim Dempsey

van_der_merwe__ben · ‎07-12-2007

Hi, thank you. I shall keep that idea in mind.

*Currently* we do not have any Fortran memory corruption or leak problems that we know of, but it happens every now and then that I have to hunt one down.

Given that we have over 20,000 fortran files being compiled into the application, it would be nice if there was some automated way to check and find them even if they dont cause a crash.

I run Purify every now and then to hunt down and clean up anything bad on the C++ side, but I am not aware of any good automatic memory corruption detection tools that can handle DLLs that link in Fortran obj files. I shall try the option in AQTime next time and see. AQTime is actually quite happy profling Intel Fortran and their support actually fixed some problems we were having.

Ideally you should be able to run the application under a tool that checks all memory access (C++ and Fortran) so you can make sure there is nothing lurking in there. Customers dont like crashes.