I did regression tests for switching from IF 2013 to IF 2016 and suddenly a subroutine that took 11K of stack now wants 3M of stack. Please note that this IS NOT LIKE https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/590249 since I neither have dynamic allocated arrays nor doing array-operations on large arrays. I could track down a strange effect where duplicating a single function call adds 128K of stack-space.
Changing Fortran > Optimization > Heap Arrays to 0 as suggest in https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/590249 didn't change anything.
Please refrain from proposing to increase the total stack-space of the process: this is a problem of the compiler not the OS
If temporary (stack) arrays are not the issue (stated in your post), Then be aware that the newer Fortran standard specifies that locally declared arrays are automatic arrays. In prior versions these were (or may have been) SAVE arrays. You will have to determine if this is the cause of the consumption of stack. More important (if multi-threaded application), you have to determine if this data must be SAVE or must not (necessarily) be SAVE.
The /Qauto places locally declared arrays on stack.
/Qsave places locally declared arrays in static location (like SAVE attribute)
Earlier versions of Fortran defaulted to /Qsave, newer standards specify /Qauto as default.
You also have a runtime diagnostic to report if an array temporary is created, You might try that.
I made a little example which looks very innocent but takes 1.2M stack-space (x86, debug-build, in a default VS2015/IF2016 FORTRAN-console project, not a single option changed)
module source integer*4 i integer*4, parameter :: largearray(50000) = (/(42,i=1,50000) /) contains subroutine heap_test(a,b,c,d,e,f) integer*4, intent(in), value :: a,b,c,d,e,f end subroutine end module subroutine test() use source call heap_test(largearray(452),largearray(4532),largearray(4152),largearray(4552),largearray(4582),largearray(45)) end program Console implicit none call test() end program Console
Copying line 20 results in additional 1.2M stack-usage for each copy. It seems like each time an element from a parameter-array is used, the compiler reserves space for the whole array on the stack.
By the way: standard release-builds of the above code (even with /O3) show the same behavior (same for x64), they just crash with a stack-overflow. So IMO this is a very serious security and performance bug in the IF2016 compiler which should be fixed immediately.
First, thank you so much for the small reproducer.
What is provoking your stack overflow is the "value" attribute on the dummy argument (line 10).
The semantics of "value" state that the contents cannot be modified, and so to do that, we make a copy.
There was clearly a bug in the compiler for a brief period of time where we made a copy of the WHOLE array, not just the one element in question.
I do not see that behavior in our current development compiler (which is virtually identical to the Update release that should be available soon) but I have not found our internal edit that resolved that problem either.
I can readily reproduce the incorrect behavior in the released version installed on my machine.
In the meantime, so that you can continue actually being productive, either remove the "value" from your declaration, or use the command line option /assume:nostd_value (you will have to manually add it to Properties->Fortran->Command Line).
And, when I have access to the Update release, I'll test your program with it.
While playing around with the example code, I found out some additional defects the should be solved (all only appear if /assume:nostd_value is NOT used):
heap_test like in the example above
1. if an actual argument to heap_test is an element from a parameter-array, then the whole array is locally copied onto the stack, and for each usage as parameter it generates another copy (even in release builds with /O3 !)
2. if heap_test is called multiple times from the same routine, the space for the temporary copies isn't reused (also with /O3)
3. the optimizer (even with /O3 !!!) is unable to detect that the calling subroutine "test" is empty: it won't call heap_test but it does lot of unnecessary memmove and return, see assembler listing (x64 build with /O3) (WITH /assume:nostd_value and /O3 the call to "test" is optimized away)
subroutine test() use source call heap_test(largearray(452),largearray(4532),largearray(4152),largearray(4552),largearray(4582),largearray(45)) end
subroutine test() 00007FF6E1AD1040 mov eax,124FA8h 00007FF6E1AD1045 call __chkstk (07FF6E1AD25A0h) 00007FF6E1AD104A sub rsp,124FA8h use source call heap_test(largearray(452),largearray(4532),largearray(4152),largearray(4552),largearray(4582),largearray(45)) 00007FF6E1AD1051 lea rdx,[SOURCE_mp_LARGEARRAY+70Ch (07FF6E1AD770Ch)] 00007FF6E1AD1058 mov r8d,30D40h 00007FF6E1AD105E lea rcx,[rsp+20h] 00007FF6E1AD1063 call memmove (07FF6E1AD3430h) 00007FF6E1AD1068 lea rdx,[SOURCE_mp_LARGEARRAY+46CCh (07FF6E1ADB6CCh)] 00007FF6E1AD106F lea rcx,[rsp+30D60h] 00007FF6E1AD1077 mov r8d,30D40h 00007FF6E1AD107D call memmove (07FF6E1AD3430h) 00007FF6E1AD1082 lea rdx,[SOURCE_mp_LARGEARRAY+40DCh (07FF6E1ADB0DCh)] 00007FF6E1AD1089 lea rcx,[rsp+61AA0h] 00007FF6E1AD1091 mov r8d,30D40h 00007FF6E1AD1097 call memmove (07FF6E1AD3430h) 00007FF6E1AD109C lea rdx,[SOURCE_mp_LARGEARRAY+471Ch (07FF6E1ADB71Ch)] 00007FF6E1AD10A3 lea rcx,[rsp+927E0h] 00007FF6E1AD10AB mov r8d,30D40h 00007FF6E1AD10B1 call memmove (07FF6E1AD3430h) 00007FF6E1AD10B6 lea rdx,[SOURCE_mp_LARGEARRAY+4794h (07FF6E1ADB794h)] 00007FF6E1AD10BD lea rcx,[rsp+0C3520h] 00007FF6E1AD10C5 mov r8d,30D40h 00007FF6E1AD10CB call memmove (07FF6E1AD3430h) 00007FF6E1AD10D0 lea rdx,[SOURCE_mp_LARGEARRAY+0B0h (07FF6E1AD70B0h)] 00007FF6E1AD10D7 lea rcx,[rsp+0F4260h] 00007FF6E1AD10DF mov r8d,30D40h 00007FF6E1AD10E5 call memmove (07FF6E1AD3430h) end 00007FF6E1AD10EA add rsp,124FA8h 00007FF6E1AD10F1 ret