Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

why am i getting stack corruption

dajum
Novice
944 Views

I have a routine that runs fine in release mode, but crashes after completing in debug mode with a stack corruption.  I can see the stack get corrupted (the call stack becomes messed up in Visual Studio) when it executes the qword ... rdi below.  I've set  /STACK:2000000000 for linking and that doesn't help.  MREC is an integer being passed in.  Compile flags below too.

Any help appreciated.

Dave

    SUBROUTINE RESTAR_OLD(MREC)
00007FF765F5BE68  push        rbp  
00007FF765F5BE69  mov         eax,0D2940h  
00007FF765F5BE6E  call        __chkstk (07FF76661B4E0h)  
00007FF765F5BE73  sub         rsp,0D2940h  
00007FF765F5BE7A  lea         rbp,[rsp+70h]  
00007FF765F5BE7F  mov         qword ptr [rsp],rax  
00007FF765F5BE83  mov         rax,0D293Ch  
00007FF765F5BE8A  mov         dword ptr [rsp+rax],0CCCCCCCCh  
00007FF765F5BE91  sub         rax,4  
00007FF765F5BE95  cmp         rax,4  
00007FF765F5BE99  jg          RESTAR_OLD+22h (07FF765F5BE8Ah)  
00007FF765F5BE9B  mov         rax,qword ptr [rsp]  
00007FF765F5BE9F  mov         dword ptr [rsp],0CCCCCCCCh  
00007FF765F5BEA6  mov         dword ptr [rsp+4],0CCCCCCCCh  
00007FF765F5BEAE  mov         qword ptr [rbp+0D28C0h],rdi  
00007FF765F5BEB5  mov         qword ptr [rbp+0D28B8h],rsi  
00007FF765F5BEBC  mov         qword ptr [rbp+0D28B0h],rbx  
00007FF765F5BEC3  mov         qword ptr [MREC],rcx  
00007FF765F5BECA  mov         byte ptr [rbp+0D0666h],0  
00007FF765F5BED1  mov         byte ptr [rbp+0D0667h],0 

 

compilation flags

/nologo /debug:full /Od /heap-arrays0 /I"C:\sf60\proces\..\TempWorkspace16\x64\procesCur16\Debug" /I"C:\sf60\util\..\TempWorkspace16\x64\utilityCur_16\Debug" /I"C:\sf60\SamgDll\SamgDll\x64\Debug" /recursive /extend_source:132 /Qopenmp /warn:truncated_source /warn:interfaces /integer_size:64 /real_size:64 /assume:byterecl /Qinit:zero /fpe:0 /iface:cref /iface:mixed_str_len_arg /module:"..\..\x64\Current" /object:"work/" /Fd"work\astap.pdb" /traceback /check:none /libs:dll /threads /dbglibs /c

0 Kudos
7 Replies
Steve_Lionel
Honored Contributor III
944 Views

There's no way to provide you a useful answer based on this snippet of instruction sequences. My experience is that stack corruption issues have a cause far removed from where the corruption appears. At least you don't have STDCALL vs. C calling mechanisms to deal with on x64.

0 Kudos
mecej4
Honored Contributor III
944 Views

You have used a large number of non-default compiler options, so it is no simple matter to work out in your mind what the generated instruction codes ought to be. Furthermore, the ability of an IDE such as Visual Studio (and debuggers, in general) to display the call chain can be adversely affected by code generation optimizations. Therefore, you should not conflate a corrupted view of the call chain ("stack") with "stack corruption".

I see no problem with the mov instruction that you flagged. The memory addressed in that instruction is just the second QWORD that was initialized to 0CCCCCCCCH in a loop earlier in the code.

I find the presence of "qword ptr [MREC]" puzzling. If MREC is a simple scalar argument, why should it be saved to memory? In such circumstances, no explanations may be reasonably expected unless you have presented enough of the source code (plus data, how-to instructions, etc.) for someone else to reproduce the claimed problem.

0 Kudos
JVanB
Valued Contributor II
944 Views

@mecej4, I agree that there should be no problem with mov qword ptr [rbp+0D28C0h],rdi. But one of the first things that tends to happen in a Windows x64 procedure is to save the register-passed arguments in the parameter save area just above the return address in the stack. Thus probably [MREC] = [rsp+0D2950h] = [rbp+0D28E0h], the save area address for the first register argument. This frees up rcx for other uses.

EDIT: Put rcx at the wrong end of the parameter save area. See https://msdn.microsoft.com/en-us/library/ew5tede7.aspx . Now fixed.

0 Kudos
mecej4
Honored Contributor III
944 Views

RO, thanks, now I see. Normally, when I see a symbol as the entire r/m, I think of that symbol as a fixed address of a variable (in the .data section). If the symbol is a macro defined to be an expression such as rsp+0D2950H, it would help if that definition were shown as part of any listing where it is used.

0 Kudos
dajum
Novice
944 Views

I'm not much good at reading the assembler code anymore, but it would appear that CCCCCCCC words are being used here which seems like a lot.   During the loop it would appear that the corruption occurs, I just don't know why.I don't understand where that size is coming from.  THere is only one argument so it can't be storage for anything in the argument list.  Maybe something in the routine, but that seems unlikely as well.  Happy to provide more source if it would help. I was hoping someone that know assembler better than I would see what manipulatons were being made on the stack here and maybe why.

Dave

0 Kudos
JVanB
Valued Contributor II
944 Views

For some reason your subroutine is reserving 862528 bytes on the stack. It's not overwriting 0CCCCCCCCh words but rather overwriting those 862528 bytes each with the value 0CCh. No corruption should be occurring while doing this because the current instance of subroutine restar_old owns this memory below the address rsp pointed to at procedure entry, along with the parameter save area. Now, if a previous procedure pointed a pointer at an unsaved local variable or internal procedure and returned the pointer then that pointer would actually have undefined association status and might still point to something on the live stack that looks useful until the stack gets overwritten. In this case, as Steve said, the error happened when the [speculated] procedure returned a dead pointer walking, not in subroutine restar_old. It can be difficult to hunt down an error like this.

0 Kudos
jimdempseyatthecove
Honored Contributor III
944 Views

In Debug mode, the new stack reserve area (for subroutine local data) is initialized with 0CCh as an indicator that the called subroutine had not written to those locations. In Release mode this initialization code is not present. During subroutine/function execution, these stack reserved areas may/will get overwritten with valid data (which may or may not contain 0CCh in the bytes). The symptom of a program behaving differently (e.g. crash or "wrong"/different results) is indicative of use of an uninitialized variable. IOW in Debug mode, 0CCh, 0CCCCh, 0CCCCCCCCh is being used, whereas in Release mode, whatever leftover content was found at those stack locations was used (different content == different result/behavior). Use the runtime diagnostic checks for uninitialized data. This can catch most of the uninitialized variable usages. It might not catch an uninitialized variable passed by call as argument with intention of returning a value (but not initialized) then subsequently used by the caller. This though can be caught at compile time if you properly attribute the dummy argument with INTENT(OUT) in the called routine.

RE: [MREC]

The Debug code symbol will include the address and/or rsp offset for variables. The Debugger is not smart enough to disambiguate an arbitrary value from that which resolves to the same  address and/or rsp offset for variables.

Jim Dempsey

0 Kudos
Reply