Solved: "Run-Time Check Failure #2 - Stack around the variable '$io_ctx' was corrupted" in debug mode

Feng__Jesse · ‎11-02-2023

I have found many posts online regarding "stack around the varaible ____ was corrupted" however, in those posts, the variable is part of the code. In my case, the issue only occurs in debug mode and not release mode, and io_ctx is not a variable in my code. I am at a loss on how to debug this, does anyone have experience with this run-time error before and can give me some ideas on how to investigate this issue? This has never happened before in this code.

jimdempseyatthecove · ‎11-04-2023

>>I commented out the call, the run-time failure still exists, but the error changes everytime.

What this indicates is that something before the call to write_umat corrupted something in your program.

The corruption could be anywhere: stack in-frame, stack out-of-frame, heap-allocated, heap-returned, code

This type of problem is difficult to trace because any change in the program (e.g. commenting out a call) will change the symptom and may even run without symptom (but still be corrupting something).

The full runtime checks will catch most, but not all, runtime errors. Remaining candidates:

1) Incorrect interface to 3rd party library (iow interface provided or no interface provided .AND. no procedure compiled to match the interface).

2) Invalid pointer

3) A dummy array argument that does not pass/construct the array descriptor from the caller

real :: array(m,n) ! m and n passed as arguments to the call

whereas:

real :: array(:,:) ! assumes bounds of caller

Many years ago, I had an heisenbug (one that goes away/moves about when you look at/for it). This particular bug symptom was caused by something modifying code in a sneaky manner such that a register reference in the instruction changed. This caused the memory access to be not that as intended. Then this corrupted memory. This problem was particularly challenging to find (and resolve) because the debugger disassembly would show the correct instructions. Only by snapshotting the code segment during runtime would expose the bug. Catching the bug was difficult as monitoring memory change of the instruction did not work, yet memory was being changed. As it turns out, the SIB byte of the instruction was being changed to 0x03. A 0x03 when used as a one byte instruction is a software interrupt (used by debuggers).

This lead me to assume that something was screwed up in MS VS as there were no break points set at this location. The corrective measure was to delete all break points using the Red X (as opposed to individual deletions).

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎11-02-2023

Have you compiled your Debug build with all runtime checks enabled? In particular, array bounds checking.

Is your code using pointers? If so, perhaps a pointer is used before it is initialized (or after its target was destroyed). For example, if you have a module located pointer that points to a stack item in procedure A, and then is used in procedure B under the assumption that what it was pointed to was still valid.

Note, the lack of error in release build is not a confirmation that the code was running correctly.

Jim Dempsey

Feng__Jesse · ‎11-02-2023

The code does not use pointers.

Thanks for the suggestion. I turned on all checks and during run-time, it printed an unrelated warning that I fixed:

But the code still breaks here:

This is subroutine UMAT that is called inside modules_main.for.

When I put the breakpoint at "return" and then step into the END line, the error is slightly different:

Edit:

I just realized that the run-time check does not stop the code from continuing, and it does not have this check failure on the second call to UMAT.

jimdempseyatthecove · ‎11-02-2023

what is "write_umat"?

Is this a Fortran procedure?

If so, is its interface available, and correct?

Note if write_umat is a Fortran subroutine .AND. compiled with your program (project) then the interface is available for Debug build interface checking. However, if write_umat is in an external library, compiled outside of the Project, then the interface might not be available for interface checking .OR. the supplied interface is incorrect.

If write_umat is a C/C++ (or other) program, then check the interface (both sides).

Also, it appears that you are passing character strings, if to C/C++ did you remember to append a null character?

Jim Dempsey

Feng__Jesse · ‎11-02-2023

Writr_umat is a subroutine in the code under modules.for. This subroutine simply stores various outputs to an array that is passed out of umat and used as a state variable array or output array.

May I ask why are you asking about write_umat? I don't see anywhere that suggests it might be a problem?

jimdempseyatthecove · ‎11-02-2023

If you call a procedure, and the interface is wrong, then the use of the (incorrect/incompatible) dummy arguments could trash the stack.

Note, if you are passing an array, in particular one that gets modified .AND. you pass the extents of the dimension(s) to write_umat, the runtime check for out of bounds will not catch an out of bounds. Add some (temporary) sanity checks to assert the bounds of the array are correct. I am not sure if rtemperature is an array, 1D or 2D (or ?D) and if NPT is the number of points.

Something like

if(size(rtemperature) /= NPT) then
  print *,"Break here"
endif
call write_umat(...

If rtemperature is a 2D array you will have to correct the if statement.

Jim Dempsey

Feng__Jesse · ‎11-03-2023

rTemperature is just a scalar. I checked the other variables per your response and I didn't find anything that would cause that issue.

JohnNichols · ‎11-03-2023

Comment out everything in the subroutine, check it runs as a null function

Then add back in one line at a time until you generate the error and then let us look at what generates the error

Some times this is the only way.

Try in release mode or 64 bit

Feng__Jesse · ‎11-03-2023

I commented out the call, the run-time failure still exists, but the error changes everytime. Now it says ARGBLOCK_770 instead.

But it seems this might eventually lead to the problematic line, I will keep looking at it. Thanks for the help!

Ron_Green · ‎11-03-2023

Getting back to the arguments, try compiler option

/warn:interfaces

to see that the arguments in calls match the interface to the write_umat routine.

Is the data going to the IO routine large? Perhaps you are exhausting stack. Try this simple option

/heap-arrays:0

Feng__Jesse · ‎11-03-2023

Turning on warning:interfaces returns yet another failure message:

Run-Time Check Failure #2 - Stack around the variable '_CONCAT_TABLE_10' was corrupted.

And note that this only occurs on the first call to the subroutine and it does not prevent the code from continuing.

This makes me think of possible memory issues. This is my office computer and it is on its last leg, if the harddrive or memory is having corruption issues, would you think this error would occur? The sizes of arrays in this code are indeed very large, but setting the heap to 0 does not fix the issue.

jimdempseyatthecove · ‎11-04-2023

>>I commented out the call, the run-time failure still exists, but the error changes everytime.

What this indicates is that something before the call to write_umat corrupted something in your program.

The corruption could be anywhere: stack in-frame, stack out-of-frame, heap-allocated, heap-returned, code

This type of problem is difficult to trace because any change in the program (e.g. commenting out a call) will change the symptom and may even run without symptom (but still be corrupting something).

The full runtime checks will catch most, but not all, runtime errors. Remaining candidates:

1) Incorrect interface to 3rd party library (iow interface provided or no interface provided .AND. no procedure compiled to match the interface).

2) Invalid pointer

3) A dummy array argument that does not pass/construct the array descriptor from the caller

real :: array(m,n) ! m and n passed as arguments to the call

whereas:

real :: array(:,:) ! assumes bounds of caller

Many years ago, I had an heisenbug (one that goes away/moves about when you look at/for it). This particular bug symptom was caused by something modifying code in a sneaky manner such that a register reference in the instruction changed. This caused the memory access to be not that as intended. Then this corrupted memory. This problem was particularly challenging to find (and resolve) because the debugger disassembly would show the correct instructions. Only by snapshotting the code segment during runtime would expose the bug. Catching the bug was difficult as monitoring memory change of the instruction did not work, yet memory was being changed. As it turns out, the SIB byte of the instruction was being changed to 0x03. A 0x03 when used as a one byte instruction is a software interrupt (used by debuggers).

This lead me to assume that something was screwed up in MS VS as there were no break points set at this location. The corrective measure was to delete all break points using the Red X (as opposed to individual deletions).

Jim Dempsey