- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Intel® Parallel Studio XE 2016 Update 4, I am declaring an array of complex numbers and allocate and initialize this array as:
complex(kind=def_kind), allocatable, dimension(:) :: resid allocate(resid(1:140000),STAT=istatus) resid = cmplx(0.1D0,1.0D0,kind=def_kind)
After executing initialization line, I get a stack overflow error, which is really puzzling me. Reviewing generated assembly shows that compiler seems to be checking array bounds not to exceed stack size, which is not reasonable considering that "resid" is declared as allocatable and it should be on heap. So, I recompiled the very same code with Intel® Parallel Studio XE 2016 Update 1 and generated assembly code is totally different and I don't get stack overflow there.
As a side note, one might think that code could have a prior memory corruption that does not show up until we reach to initialization line. However, the same issue happens in different places in the code and it only occurs when I'm initializing whole array as above. If I use a do-loop for initialization, no error is observed. Also, it is observed in v16, update 4.
Any help is appreciated.
Assembly code for initialization line:
Intel® Parallel Studio XE 2016 Update 4
resid = cmplx(0.1D0,1.0D0,kind=def_kind) 10F91336 mov eax,esp 10F91338 mov dword ptr [ebp-210h],eax 10F9133E mov eax,13BAC8C0h 10F91343 add eax,18h 10F91346 mov eax,dword ptr [eax] 10F91348 mov dword ptr [ebp-20Ch],eax 10F9134E mov eax,dword ptr [ebp-20Ch] 10F91354 imul eax,eax,8 10F91357 add eax,0Fh 10F9135A and eax,0FFFFFFF0h 10F9135D call _chkstk (13633600h) (Stack overflow happens here) 10F91362 mov eax,esp 10F91364 mov dword ptr [ebp-208h],eax 10F9136A mov eax,dword ptr [ebp-208h] 10F91370 mov dword ptr [ebp-204h],eax 10F91376 mov dword ptr [ebp-200h],1 10F91380 mov eax,dword ptr [ebp-200h] 10F91386 mov edx,13BAC8C0h 10F9138B add edx,18h 10F9138E mov edx,dword ptr [edx] 10F91390 cmp eax,edx 10F91392 jle DO_ARP_ITER+1186h (10F91426h) 10F91398 mov eax,13BAC8C0h 10F9139D add eax,20h 10F913A0 mov eax,dword ptr [eax] 10F913A2 mov dword ptr [ebp-1E8h],eax 10F913A8 mov dword ptr [ebp-200h],1 10F913B2 mov eax,dword ptr [ebp-200h] 10F913B8 mov edx,13BAC8C0h 10F913BD add edx,18h 10F913C0 mov edx,dword ptr [edx] 10F913C2 cmp eax,edx 10F913C4 jg DO_ARP_ITER+11BAh (10F9145Ah) 10F913CA mov eax,dword ptr [ebp-200h] 10F913D0 imul eax,eax,8 10F913D3 add eax,dword ptr [ebp-204h] 10F913D9 add eax,0FFFFFFF8h 10F913DC mov edx,dword ptr [ebp-1E8h] 10F913E2 imul edx,edx,8 10F913E5 add edx,dword ptr ds:[13BAC8C0h] 10F913EB mov ecx,13BAC8C0h 10F913F0 add ecx,20h 10F913F3 mov ecx,dword ptr [ecx] 10F913F5 imul ecx,ecx,8 10F913F8 sub edx,ecx 10F913FA movsd xmm0,mmword ptr [eax] 10F913FE movsd mmword ptr [edx],xmm0 10F91402 mov eax,1 10F91407 add eax,dword ptr [ebp-1E8h] 10F9140D mov dword ptr [ebp-1E8h],eax 10F91413 mov eax,1 10F91418 add eax,dword ptr [ebp-200h] 10F9141E mov dword ptr [ebp-200h],eax 10F91424 jmp DO_ARP_ITER+1112h (10F913B2h) 10F91426 movsd xmm0,mmword ptr ds:[13874D6Ch] 10F9142E mov eax,dword ptr [ebp-200h] 10F91434 imul eax,eax,8 10F91437 add eax,dword ptr [ebp-204h] 10F9143D add eax,0FFFFFFF8h 10F91440 movsd mmword ptr [eax],xmm0 10F91444 mov eax,1 10F91449 add eax,dword ptr [ebp-200h] 10F9144F mov dword ptr [ebp-200h],eax 10F91455 jmp DO_ARP_ITER+10E0h (10F91380h) 10F9145A mov eax,dword ptr [ebp-210h] 10F91460 mov esp,eax
Intel® Parallel Studio XE 2016 Update 1
resid = cmplx(0.1D0,1.0D0,kind=def_kind) 603DF7F8 mov eax,62C792A0h 603DF7FD add eax,18h 603DF800 mov eax,dword ptr [eax] 603DF802 mov dword ptr [ebp-200h],eax 603DF808 mov eax,62C792A0h 603DF80D add eax,20h 603DF810 mov eax,dword ptr [eax] 603DF812 mov dword ptr [ebp-1FCh],eax 603DF818 mov dword ptr [ebp-1F8h],1 603DF822 mov eax,dword ptr [ebp-1F8h] 603DF828 mov edx,62C792A0h 603DF82D add edx,18h 603DF830 mov edx,dword ptr [edx] 603DF832 cmp eax,edx 603DF834 jg DO_ARP_ITER+1128h (603DF884h) 603DF836 movsd xmm0,mmword ptr ds:[629AF8ECh] 603DF83E mov eax,dword ptr [ebp-1FCh] 603DF844 imul eax,eax,8 603DF847 add eax,dword ptr ds:[62C792A0h] 603DF84D mov edx,62C792A0h 603DF852 add edx,20h 603DF855 mov edx,dword ptr [edx] 603DF857 imul edx,edx,8 603DF85A sub eax,edx 603DF85C movsd mmword ptr [eax],xmm0 603DF860 mov eax,1 603DF865 add eax,dword ptr [ebp-1FCh] 603DF86B mov dword ptr [ebp-1FCh],eax 603DF871 mov eax,1 603DF876 add eax,dword ptr [ebp-1F8h] 603DF87C mov dword ptr [ebp-1F8h],eax 603DF882 jmp DO_ARP_ITER+10C6h (603DF822h)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recall seeing this a while ago - the compiler is constructing the cmplx array on the stack and then moving it. Not optimal. I think this was fied in version 17.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiler creates a temporary array for complex numbers (doesn't happen for real numbers). If I set heap-arrays to 0, temporary array is allocated on heap and that fixes the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried an example with 17.0.2 and no stack temp was created.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Steve. It's good to know that the problem has been fixed.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page