inconsistent ACCESS VIOLATION

tim_s_1 · ‎07-17-2013

I am having problems with an error 157 (Access Violation), but can't seem to make any headway.

The biggest challenge facing me now is that the code does not always crash at the same point. The code is structured such that many iterative calls to the same routines are performed. Each time they are in theory using the same memory over and over again.

I have been studying Steve Lionel's article "Don't Touch Me There" and I believe I have ruled out the common sources:

Mismatches in argument lists, so that data is treated as an address
Out of bounds array references
Mismatches in C vs. STDCALL calling mechanisms, causing the stack to become corrupted
References to unallocated pointers

I compiled in debug mode in Visual Studio 2012 with all of the standard checks in place (including interface checks and array bounds checks).

During compile time there were a couple minor interface errors (an argument type mismatch from a real to an integer when the dummy argument was not used) which I corrected and now have no warnings so that should rule out issue #1.

I ran the code with bounds checking turned on (for this part I ran it with only this option in release mode because the code takes days to weeks to run normally and I cannot afford to slow it down more than needs be). I did not get any out of bounds issues when running through all of the routines at least once, and in theory the loop limits should be the same throughout the run. As I mentioned earlier, the access violation does not happen at a consistent point (sometimes it happens during iteration 1 other times at iteration 37) so I can't be completely certain that I have ruled this out.

The code is all fortran so I believe that rules out issue #3.

I am not real sure how to rule out the references to unallocated pointers. The code was all fortran70 before and should not have had any pointers. I have recently moved some common blocks to modules and in so doing have created some pointers. The program that I am working with is rather large and proprietary so I cannot upload any actual source code.

but an example is

Old Code:

PROGRAM FOO
COMMON /CBLOCK/ARRAY1(NX,NY,NZ),ARRAY2(NX,NY,NZ),ARRAY3(NX,NY,NZ),...,ARRAY12(NX,NY,NZ)

REGULAR CODE

END

New Code:

PRGRAM FOO
USE CBLOCK
CALL POINT_CBLOCK(NX,NY,NZ)

REGULAR CODE

END

C-----------------------------------------------------------------------
MODULE CBLOCKARRAY
REAL, ALLOCATABLE, TARGET, SAVE :: BIGARRAY(:,:,:,:)
CONTAINS
SUBROUTINE ALLOC_CBLOCKARRAY(NX,NY,NZ)
ALLOCATE (BIGARRAY(NX,NY,NZ,12))
BIGARRAY = 0.0
END SUBROUTINE ALLOC_CBLOCKARRAY
END MODULE CBLOCKARRAY
C-----------------------------------------------------------------------
MODULE CBLOCK
USE CBLOCKARRAY
REAL, POINTER, SAVE :: ARRAY1(:,:,:),ARRAY2(:,:,:),...,ARRAY12(:,:,:)
CONTAINS
SUBROUTINE POINT_CBLOCK(NX,NY,NZ)
CALL ALLOC_CBLOCKARRAY(NX,NY,NZ)
ARRAY1=>BIGARRAY(:,:,:,1)
ARRAY2=>BIGARRAY(:,:,:,2)
ARRAY3=>BIGARRAY(:,:,:,3)
.
.
.
ARRAY12=>BIGARRAY(:,:,:,12)
END SUBROUTINE POINT_CBLOCK
END MODULE CBLOCK
C-----------------------------------------------------------------------

other routines using the cblock module only have the "use cblock" statement and not the point statement.

One of the first things that the program does is allocate and point all of the module arrays.

One place that this could be becoming an issue is in some of the subroutines that use these modules.

I have routines that look something like

SUBROUTINE SUB1
USE CBLOCK
CALL SUB2(ARRAY1,ARRAY7,ARRAY9)
END

SUBROUTINE SUB2(ARRAY1,ARRAY7,ARRAY9)
DIMENSION ARRAY1(NX,NY,NZ),ARRAY7(NX,NY,NZ),ARRAY9(NX,NY,NZ)
CALL SUB3(ARRAY7,ARRAY9)
END

SUBROUTINE SUB3(ARRAY7,ARRAY9)
DIMENSION ARRAY7(NX,NY,NZ),ARRAY9(NX,NY,NZ)
CALL SUB4(ARRAY9)
END

And so on, only the argument list is actually hundreds of arguments that may have all come from various modules in the top routine.

Is there anything wrong with this calling method?

Any other suggestions?

By the way, I have started the code in full debug mode with the /check:pointer /check:bounds /check:uninit /check:stack run time checks on but it will probably be weeks before that gets me anywhere.

Any help would be greatly appreciated.

jimdempseyatthecove · ‎07-17-2013

Unless you require this option, I suggest you add /assume:norealloc_lhs

This will prevent BIGARRAY from getting reallocating (after pointers setup) by issuing:

BIGRRAY = EvenBiggerArray

This may also expose other unintended reallocations.

BTW, SAVE attribute is meaningless for module data, it is implicitly SAVE.

Jim Dempsey

tim_s_1 · ‎07-17-2013

Isn't /assume:norealloc_lhs the default? I have never delt with unintended reallocations before as the code was all fortran 70 and no array assignments like this could be made and the code was all wirten to perform asignments element by element.

Is there a compile flag that would expose all array assignments of mismatched size?

Thanks

IanH · ‎07-17-2013

While the probability is perhaps small, note that the /warn:interfaces style checks for external procedures relies on the external procedure having been compiled (since the last "clean") before the reference to the procedure. (You haven't ruled out #1, you've just made an unlikely possibility.)

I just looked into my coffee cup, and the remnant grounds happen to spell out "reference of undefined variable, which /check:uninit may or may not catch". Perhaps that just indicates that it is well and truely time for that particular cup to go for a spin in the dishwasher.

jimdempseyatthecove · ‎07-17-2013

>>The code was all fortran70 before and should not have had any pointers. I have recently moved some common blocks to modules and in so doing have created some pointers

Some of the old FORTRAN 70/77 programs I had the occasion to update used named COMMON blocks with different variable names or sequence or size ... Moving these into a module can be difficult, as X in one common block may be Y in a different common block (of same name), and the code relies on the two being intertwined. The usual pitfall in the migration is to seperate the two variables. Review your code changes to assure no issues relating to this.

Jim Dempsey

tim_s_1 · ‎07-22-2013

Still having problems.

I have been tracing down any potentially mismatched names/array sizes either from an argument list or a module. I have not found the source of the problem yet.

Is there a way in visual fortran to have debug work on just one routine. If debug is running on all routines it is way to slow.

I think I have narrowed it down to a handful of potential routines where the problem may be happening. If I could debug just those routines I might be able to figure this out.

Thanks.

Steven_L_Intel1 · ‎07-22-2013

You can start with a Debug configuration and set the project optimization level to Fast (2). Then right click on the source file you want to debug and change its optimization level to Disabled (0). Then build.

tim_s_1 · ‎07-22-2013

I know that bounds checking only checks against the local defined size so if I pass an array that is (3,3,3) to a routine but in that subroutine the limits are defined as (3,3,4) and used in that way, bounds checking will not catch it but there may be an accesses violation.

Is there a way to set a runtime check that would verify if the array is used with the same bounds in variaous routines.

Steven_L_Intel1 · ‎07-22-2013

No, but this is something that the Generated Interface Checking feature can detect. Of course, if you were using modules for your procedures, the compiler would complain by default. For example:

[plain]
C:\Projects>type t.f90
integer a(3,3,3)
call sub(a)
print *, a
end
subroutine sub (b)
integer b(3,3,4)
b = 8
end

C:\Projects>ifort /warn:interface t.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.3.198 Build 20130607
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.

t.f90(2): error #7983: The storage extent of the dummy argument exceeds that ofthe actual argument.
call sub(a)
---------^
compilation aborted for t.f90 (code 1)
[/plain]

This feature is on by default in new projects.

tim_s_1 · ‎07-22-2013

Is the Generated Interface Checking feature more than just the /warn:interfaces option?

Steven_L_Intel1 · ‎07-22-2013

No - that's it.