- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have found many posts online regarding "stack around the varaible ____ was corrupted" however, in those posts, the variable is part of the code. In my case, the issue only occurs in debug mode and not release mode, and io_ctx is not a variable in my code. I am at a loss on how to debug this, does anyone have experience with this run-time error before and can give me some ideas on how to investigate this issue? This has never happened before in this code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I commented out the call, the run-time failure still exists, but the error changes everytime.
What this indicates is that something before the call to write_umat corrupted something in your program.
The corruption could be anywhere: stack in-frame, stack out-of-frame, heap-allocated, heap-returned, code
This type of problem is difficult to trace because any change in the program (e.g. commenting out a call) will change the symptom and may even run without symptom (but still be corrupting something).
The full runtime checks will catch most, but not all, runtime errors. Remaining candidates:
1) Incorrect interface to 3rd party library (iow interface provided or no interface provided .AND. no procedure compiled to match the interface).
2) Invalid pointer
3) A dummy array argument that does not pass/construct the array descriptor from the caller
real :: array(m,n) ! m and n passed as arguments to the call
whereas:
real :: array(:,:) ! assumes bounds of caller
Many years ago, I had an heisenbug (one that goes away/moves about when you look at/for it). This particular bug symptom was caused by something modifying code in a sneaky manner such that a register reference in the instruction changed. This caused the memory access to be not that as intended. Then this corrupted memory. This problem was particularly challenging to find (and resolve) because the debugger disassembly would show the correct instructions. Only by snapshotting the code segment during runtime would expose the bug. Catching the bug was difficult as monitoring memory change of the instruction did not work, yet memory was being changed. As it turns out, the SIB byte of the instruction was being changed to 0x03. A 0x03 when used as a one byte instruction is a software interrupt (used by debuggers).
This lead me to assume that something was screwed up in MS VS as there were no break points set at this location. The corrective measure was to delete all break points using the Red X (as opposed to individual deletions).
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you compiled your Debug build with all runtime checks enabled? In particular, array bounds checking.
Is your code using pointers? If so, perhaps a pointer is used before it is initialized (or after its target was destroyed). For example, if you have a module located pointer that points to a stack item in procedure A, and then is used in procedure B under the assumption that what it was pointed to was still valid.
Note, the lack of error in release build is not a confirmation that the code was running correctly.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code does not use pointers.
Thanks for the suggestion. I turned on all checks and during run-time, it printed an unrelated warning that I fixed:
But the code still breaks here:
This is subroutine UMAT that is called inside modules_main.for.
When I put the breakpoint at "return" and then step into the END line, the error is slightly different:
Edit:
I just realized that the run-time check does not stop the code from continuing, and it does not have this check failure on the second call to UMAT.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
what is "write_umat"?
Is this a Fortran procedure?
If so, is its interface available, and correct?
Note if write_umat is a Fortran subroutine .AND. compiled with your program (project) then the interface is available for Debug build interface checking. However, if write_umat is in an external library, compiled outside of the Project, then the interface might not be available for interface checking .OR. the supplied interface is incorrect.
If write_umat is a C/C++ (or other) program, then check the interface (both sides).
Also, it appears that you are passing character strings, if to C/C++ did you remember to append a null character?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
May I ask why are you asking about write_umat? I don't see anywhere that suggests it might be a problem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you call a procedure, and the interface is wrong, then the use of the (incorrect/incompatible) dummy arguments could trash the stack.
Note, if you are passing an array, in particular one that gets modified .AND. you pass the extents of the dimension(s) to write_umat, the runtime check for out of bounds will not catch an out of bounds. Add some (temporary) sanity checks to assert the bounds of the array are correct. I am not sure if rtemperature is an array, 1D or 2D (or ?D) and if NPT is the number of points.
Something like
if(size(rtemperature) /= NPT) then
print *,"Break here"
endif
call write_umat(...
If rtemperature is a 2D array you will have to correct the if statement.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
rTemperature is just a scalar. I checked the other variables per your response and I didn't find anything that would cause that issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Comment out everything in the subroutine, check it runs as a null function
Then add back in one line at a time until you generate the error and then let us look at what generates the error
Some times this is the only way.
Try in release mode or 64 bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I commented out the call, the run-time failure still exists, but the error changes everytime. Now it says ARGBLOCK_770 instead.
But it seems this might eventually lead to the problematic line, I will keep looking at it. Thanks for the help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Getting back to the arguments, try compiler option
/warn:interfaces
to see that the arguments in calls match the interface to the write_umat routine.
Is the data going to the IO routine large? Perhaps you are exhausting stack. Try this simple option
/heap-arrays:0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Turning on warning:interfaces returns yet another failure message:
Run-Time Check Failure #2 - Stack around the variable '_CONCAT_TABLE_10' was corrupted.
And note that this only occurs on the first call to the subroutine and it does not prevent the code from continuing.
This makes me think of possible memory issues. This is my office computer and it is on its last leg, if the harddrive or memory is having corruption issues, would you think this error would occur? The sizes of arrays in this code are indeed very large, but setting the heap to 0 does not fix the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I commented out the call, the run-time failure still exists, but the error changes everytime.
What this indicates is that something before the call to write_umat corrupted something in your program.
The corruption could be anywhere: stack in-frame, stack out-of-frame, heap-allocated, heap-returned, code
This type of problem is difficult to trace because any change in the program (e.g. commenting out a call) will change the symptom and may even run without symptom (but still be corrupting something).
The full runtime checks will catch most, but not all, runtime errors. Remaining candidates:
1) Incorrect interface to 3rd party library (iow interface provided or no interface provided .AND. no procedure compiled to match the interface).
2) Invalid pointer
3) A dummy array argument that does not pass/construct the array descriptor from the caller
real :: array(m,n) ! m and n passed as arguments to the call
whereas:
real :: array(:,:) ! assumes bounds of caller
Many years ago, I had an heisenbug (one that goes away/moves about when you look at/for it). This particular bug symptom was caused by something modifying code in a sneaky manner such that a register reference in the instruction changed. This caused the memory access to be not that as intended. Then this corrupted memory. This problem was particularly challenging to find (and resolve) because the debugger disassembly would show the correct instructions. Only by snapshotting the code segment during runtime would expose the bug. Catching the bug was difficult as monitoring memory change of the instruction did not work, yet memory was being changed. As it turns out, the SIB byte of the instruction was being changed to 0x03. A 0x03 when used as a one byte instruction is a software interrupt (used by debuggers).
This lead me to assume that something was screwed up in MS VS as there were no break points set at this location. The corrective measure was to delete all break points using the Red X (as opposed to individual deletions).
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page