Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Stack failure in DEBUG

dajum
Novice
1,285 Views
I'm getting a stack failure when my program is calling __chkstk in the debug version. It doesn't die in release mode. I don't think that there is an issue with the stack, could be but it only happens on this one routine. It seems to be the only one that calls --chkstk. I tried adding /check:none to the compile switches, but it still has the call to --chkstk. What turns off that call? My whole compilation list is:
/nologo /debug:full /Od /I"..\\include" /I"..\\include\\fluint" /assume:nocc_omp /recursive /extend_source:132 /integer_size:64 /real_size:64 /Qauto /assume:byterecl /fpe:0 /iface:cref /iface:mixed_str_len_arg /module:"x64\\Debug\\\\" /object:"x64\\Debug/" /traceback /check:none /libs:static /threads /dbglibs /c
0 Kudos
13 Replies
Wendy_Doerner__Intel
Valued Contributor I
1,285 Views
To remove the __chkstk you could try removing the /traceback option. But utlimately we need to get at cause of the failure on the stack. Could you post the actual error message?

Also two more troubleshooting suggestions are:

1) Try removing /fpe:0 which enables floating point exceptions
2) Why do you need /iface:mixed_str_len_arg.... are you sure the calling and callee subroutines will match up with this.

------

Wendy

Attaching or including files in a post


0 Kudos
dajum
Novice
1,285 Views
I changed to STACK:20000000000 and that seems to eliminate the stack failure, but it still doesn't run in DEBUG mode.I also removed /fpe:0 (which we like since the code then crashes nicely when we hit NaNs or divide by zero). We also need the /iface to match up with many C routines that are very old and have been used for over 15 years.

Here is the call and the start of the routine:

CALL DDATA(0.16667,'ALL','','testddata.his')

000000014000184C lea rax,[__xt_z+1A0h (14064E828h)]

0000000140001853 lea rdx,[__xt_z+188h (14064E810h)]

000000014000185A lea rcx,[__xt_z+184h (14064E80Ch)]

0000000140001861 mov qword ptr [rsp+20h],0

000000014000186A lea rbx,[__xt_z+174h (14064E7FCh)]

0000000140001871 mov qword ptr [rsp+28h],rbx

0000000140001876 mov qword ptr [rsp+30h],0Dh

000000014000187F mov qword ptr [rbp+28h],rcx

0000000140001883 mov rcx,rax

0000000140001886 mov r8d,3

000000014000188C mov rax,qword ptr [rbp+28h]

0000000140001890 mov r9,rax

0000000140001893 call DDATA (140014778h)

SUBROUTINE DDATA( TINC, ARG1, ARG2, FILE )

0000000140014778 push rbp

0000000140014779 mov eax,0A3DA40h

000000014001477E call __chkstk (14064C620h)

0000000140014783 sub rsp,0A3DA40h

000000014001478A lea rbp,[rsp+40h]

000000014001478F mov qword ptr [rbp+0A3D9F0h],rdi

0000000140014796 mov qword ptr [rbp+0A3D9E8h],rsi

000000014001479D mov qword ptr [rbp+0A3D9E0h],rbx

00000001400147A4 mov qword ptr [rbp+0A3DA10h],rcx

00000001400147AB mov qword ptr [rbp+0A3DA18h],rdx

00000001400147B2 mov qword ptr [rbp+0A3DA20h],r8

00000001400147B9 mov qword ptr [rbp+0A3DA28h],r9

00000001400147C0 mov rax,qword ptr [rbp+0A3DA20h]

00000001400147C7 mov qword ptr [rbp+0A3D9C0h],rax

00000001400147CE mov rax,qword ptr [rbp+0A3DA30h]

00000001400147D5 mov qword ptr [rbp+0A3D9C8h],rax

00000001400147DC mov rax,qword ptr [rbp+0A3DA40h]

00000001400147E3 mov qword ptr [rbp+0A3D9D0h],rax

When it executes the sub right after the __chkstk, Visual Studio just sort of stops. It is still there, but the call stack becomes blank. And it looses the current location as the arrow that indicates the next instruction disappears. If I go to run it hits an error and the Call Stack looks like:
for__issue_diagnostic
for__io_return
for_open

Which looks like it is farther into my code since I'm opening a file, but the debugger can't seem to step into this routine so I can figure out what is going on. Running it in release mode it executes, without problems, but I suspect I have some bugs in the output into the file. (I'm trying to convert the program from using KIND=4 for integers and reals to KIND=8 for both, but want this file generated using all KIND=4). It also doesn't hit any breakpoints I put in the routine either when I hit run.

But I don't know why it gets confused here.

Dave

0 Kudos
Steven_L_Intel1
Employee
1,285 Views
The _chkstk call is what checks the stack and signals a stack check error if there is insufficient stack. That can't be removed and obviously you wouldn't want to. It would be interesting in knowing what argument was passed to _chkstk to see how large an item the compiler wanted to push on the stack.

Keep in mind that the stack is limited to 1GB. Setting larger values can have unintended consequences.

What happens if you compile with /heap-arrays? (Under Optimization, set Heap Arrays to 0.)
0 Kudos
dajum
Novice
1,285 Views

I changed to STACK:1000000000 so hopefully there are no unintended consequences. I tried /heap-arrays0 and that didn't change it. What is puzzling is why it runs in release, and has strange problems in debug. Trying to step into the routine in the debugger just loses VS. It would seem the stack is messed up for some reason. Doesn't the assembly code I listed show the argument you are looking for? Or is there a way I can get it in the debugger for you?

0 Kudos
mecej4
Honored Contributor III
1,285 Views
It would be interesting in knowing what argument was passed to _chkstk to see how large an item the compiler wanted to push on the stack.

Post #2 contains this part in the disassembly:

[bash]     mov eax,0A3DA40h
call __chkstk
sub rsp,0A3DA40h
[/bash]

That's a hefty (~ 11 MB) stack adjustment, but not unreasonable. We have not seen the relevant source code.

0 Kudos
Wendy_Doerner__Intel
Valued Contributor I
1,285 Views

It does seem the stack is getting corrupted. This can happen in Debug vs. Release just because user coding errors can show up based on where memory is being layed out. Certainly mixing the size of integer might cause the error (e.g. call and callee may not be agree on the size of the an integer on the stack, you may have been assuming integers are the same size as pointer...say in a C routine you are calling and the pointer becomes corrupt when it is an 8 byte value).

Can we get the source that corresponds to this assembly code (preferably in a test case which can be compiled and run)? That would help us to see where the stack is being corrupted.

Thanks,

Wendy

------

Wendy

Attaching or including files in a post

0 Kudos
dajum
Novice
1,285 Views
It isn't easy to create a sample you can run because of all the references to the registry and licensing that is part of the code. And taking that out usually changes the memory layout enough to change the code so that problems like these go away. I'll try and reduce it down, but an easier choice is to have you demo our software, and then I can send you files that will allow you to set up a case that should replicate the problem. But even that isn't certain to reproduce this particular crash for me.
0 Kudos
Wendy_Doerner__Intel
Valued Contributor I
1,285 Views
Can we at least get the code snippet that goes with the assembly snippet (including variable declarations)?

------

Wendy

Attaching or including files in a post

0 Kudos
dajum
Novice
1,285 Views
Sure. Here is the main routine (ASTAP.FOR)which contains the call and the subroutine itself (DDATA.FOR). I have also tried removing the arguments to the call and just setting them to the same values in the call, as well as a bunch of other changes that haven't worked. I've also added the included files into the zip file. Let me know if there is something else you would like to see.

Dave
0 Kudos
Wendy_Doerner__Intel
Valued Contributor I
1,285 Views
Dave,

Thanks. We will take a look and see what we can find.

------

Wendy

Attaching or including files in a post

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,285 Views
Steve, Wendy,

Not that it matters in this case...
The code was compiled on x64 platform and the code generated is using "moveax,0A3DA40h" instead of "movrax,0A3DA40h". In this case the stackallocation is under 2/4GB and the mov zero extends rax - so no bug observed. Rethorically "What's going to happen on an 8GB stack allocation?".

Jim

Earlier post:
Post #2 contains this part in the disassembly:

  1. moveax,0A3DA40h
  2. call__chkstk
  3. subrsp,0A3DA40h
[bash][/bash]
That's a hefty (~ 11 MB) stack adjustment, but not unreasonable.
0 Kudos
dajum
Novice
1,285 Views

Any update?

0 Kudos
Wendy_Doerner__Intel
Valued Contributor I
1,285 Views
Thanks for sending the code. It does help us get a feel for your application but too large to eyeball in itself. I assume it does not build into a test case given your past remarks.

Here are some suggstions to try to see your results:

1) Add /warn:interfaces to the compile line. This will cause the compiler to warn you if interfaces are mismtached. Given that you are changing the size of your data this might catch where the problem is.

2) Try compiling without any optimizations:

/Od /Ob0 /Qip-

If the error still happens we know it is not related to optimizations in the compiler.

3) You cannot set breakpoints after the stack is corrupted, but you can set breakpoints before the call that is corrupting the stack and examine variables being passed to make sure they are the size and content you expect.

The stack size you are setting is acceptable to Windows, but may not be large enough for your application. Given that this error happens well into your application this is also a possibility.

------

Wendy

Attaching or including files in a post


0 Kudos
Reply