Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Compiler checks stack when initializing an allocatable array

Pouya_Z_
Beginner
439 Views

In Intel® Parallel Studio XE 2016 Update 4, I am declaring an array of complex numbers and allocate and initialize this array as:

complex(kind=def_kind), allocatable, dimension(:) :: resid
allocate(resid(1:140000),STAT=istatus)
resid = cmplx(0.1D0,1.0D0,kind=def_kind)

After executing initialization line, I get a stack overflow error, which is really puzzling me. Reviewing generated assembly shows that compiler seems to be checking array bounds not to exceed stack size, which is not reasonable considering that "resid" is declared as allocatable and it should be  on heap. So, I recompiled the very same code with Intel® Parallel Studio XE 2016 Update 1 and generated assembly code is totally different and I don't get stack overflow there.

As a side note, one might think that code could have a prior memory corruption that does not show up until we reach to initialization line. However, the same issue happens in different places in the code and it only occurs when I'm initializing whole array as above. If I use a do-loop for initialization, no error is observed. Also, it is observed in v16, update 4.

Any help is appreciated.

Assembly code for initialization line:

Intel® Parallel Studio XE 2016 Update 4

resid = cmplx(0.1D0,1.0D0,kind=def_kind)
10F91336  mov         eax,esp  
10F91338  mov         dword ptr [ebp-210h],eax  
10F9133E  mov         eax,13BAC8C0h  
10F91343  add         eax,18h  
10F91346  mov         eax,dword ptr [eax]  
10F91348  mov         dword ptr [ebp-20Ch],eax  
10F9134E  mov         eax,dword ptr [ebp-20Ch]  
10F91354  imul        eax,eax,8  
10F91357  add         eax,0Fh  
10F9135A  and         eax,0FFFFFFF0h  
10F9135D  call        _chkstk (13633600h)          (Stack overflow happens here)
10F91362  mov         eax,esp  
10F91364  mov         dword ptr [ebp-208h],eax  
10F9136A  mov         eax,dword ptr [ebp-208h]  
10F91370  mov         dword ptr [ebp-204h],eax  
10F91376  mov         dword ptr [ebp-200h],1  
10F91380  mov         eax,dword ptr [ebp-200h]  
10F91386  mov         edx,13BAC8C0h  
10F9138B  add         edx,18h  
10F9138E  mov         edx,dword ptr [edx]  
10F91390  cmp         eax,edx  
10F91392  jle         DO_ARP_ITER+1186h (10F91426h)  
10F91398  mov         eax,13BAC8C0h  
10F9139D  add         eax,20h  
10F913A0  mov         eax,dword ptr [eax]  
10F913A2  mov         dword ptr [ebp-1E8h],eax  
10F913A8  mov         dword ptr [ebp-200h],1  
10F913B2  mov         eax,dword ptr [ebp-200h]  
10F913B8  mov         edx,13BAC8C0h  
10F913BD  add         edx,18h  
10F913C0  mov         edx,dword ptr [edx]  
10F913C2  cmp         eax,edx  
10F913C4  jg          DO_ARP_ITER+11BAh (10F9145Ah)  
10F913CA  mov         eax,dword ptr [ebp-200h]  
10F913D0  imul        eax,eax,8  
10F913D3  add         eax,dword ptr [ebp-204h]  
10F913D9  add         eax,0FFFFFFF8h  
10F913DC  mov         edx,dword ptr [ebp-1E8h]  
10F913E2  imul        edx,edx,8  
10F913E5  add         edx,dword ptr ds:[13BAC8C0h]  
10F913EB  mov         ecx,13BAC8C0h  
10F913F0  add         ecx,20h  
10F913F3  mov         ecx,dword ptr [ecx]  
10F913F5  imul        ecx,ecx,8  
10F913F8  sub         edx,ecx  
10F913FA  movsd       xmm0,mmword ptr [eax]  
10F913FE  movsd       mmword ptr [edx],xmm0  
10F91402  mov         eax,1  
10F91407  add         eax,dword ptr [ebp-1E8h]  
10F9140D  mov         dword ptr [ebp-1E8h],eax  
10F91413  mov         eax,1  
10F91418  add         eax,dword ptr [ebp-200h]  
10F9141E  mov         dword ptr [ebp-200h],eax  
10F91424  jmp         DO_ARP_ITER+1112h (10F913B2h)  
10F91426  movsd       xmm0,mmword ptr ds:[13874D6Ch]  
10F9142E  mov         eax,dword ptr [ebp-200h]  
10F91434  imul        eax,eax,8  
10F91437  add         eax,dword ptr [ebp-204h]  
10F9143D  add         eax,0FFFFFFF8h  
10F91440  movsd       mmword ptr [eax],xmm0  
10F91444  mov         eax,1  
10F91449  add         eax,dword ptr [ebp-200h]  
10F9144F  mov         dword ptr [ebp-200h],eax  
10F91455  jmp         DO_ARP_ITER+10E0h (10F91380h)  
10F9145A  mov         eax,dword ptr [ebp-210h]  
10F91460  mov         esp,eax  

Intel® Parallel Studio XE 2016 Update 1

resid = cmplx(0.1D0,1.0D0,kind=def_kind)
603DF7F8  mov         eax,62C792A0h  
603DF7FD  add         eax,18h  
603DF800  mov         eax,dword ptr [eax]  
603DF802  mov         dword ptr [ebp-200h],eax  
603DF808  mov         eax,62C792A0h  
603DF80D  add         eax,20h  
603DF810  mov         eax,dword ptr [eax]  
603DF812  mov         dword ptr [ebp-1FCh],eax  
603DF818  mov         dword ptr [ebp-1F8h],1  
603DF822  mov         eax,dword ptr [ebp-1F8h]  
603DF828  mov         edx,62C792A0h  
603DF82D  add         edx,18h  
603DF830  mov         edx,dword ptr [edx]  
603DF832  cmp         eax,edx  
603DF834  jg          DO_ARP_ITER+1128h (603DF884h)  
603DF836  movsd       xmm0,mmword ptr ds:[629AF8ECh]  
603DF83E  mov         eax,dword ptr [ebp-1FCh]  
603DF844  imul        eax,eax,8  
603DF847  add         eax,dword ptr ds:[62C792A0h]  
603DF84D  mov         edx,62C792A0h  
603DF852  add         edx,20h  
603DF855  mov         edx,dword ptr [edx]  
603DF857  imul        edx,edx,8  
603DF85A  sub         eax,edx  
603DF85C  movsd       mmword ptr [eax],xmm0  
603DF860  mov         eax,1  
603DF865  add         eax,dword ptr [ebp-1FCh]  
603DF86B  mov         dword ptr [ebp-1FCh],eax  
603DF871  mov         eax,1  
603DF876  add         eax,dword ptr [ebp-1F8h]  
603DF87C  mov         dword ptr [ebp-1F8h],eax  
603DF882  jmp         DO_ARP_ITER+10C6h (603DF822h) 

 

0 Kudos
4 Replies
Steve_Lionel
Honored Contributor III
439 Views

I recall seeing this a while ago - the compiler is constructing the cmplx array on the stack and then moving it. Not optimal. I think this was fied in version 17.

0 Kudos
Pouya_Z_
Beginner
439 Views

Compiler creates a temporary array for complex numbers (doesn't happen for real numbers). If I set heap-arrays to 0, temporary array is allocated on heap and that fixes the problem.

0 Kudos
Steve_Lionel
Honored Contributor III
439 Views

I tried an example with 17.0.2 and no stack temp was created. 

0 Kudos
Pouya_Z_
Beginner
439 Views

Thanks Steve. It's good to know that the problem has been fixed.

0 Kudos
Reply