Stack overflow problem with IVF Composer XE 2016

Amalia_B_ · ‎08-26-2015

Just downloaded and installed the new Intel® Parallel Studio XE Composer Edition for Fortran Windows* 2016 to perform regression tests on code that I've developed and works fine under the IVF 2015 version. I immediately get a stack overflow problem on this line of code (variables abbreviated):

arr1 = cdexp(QI * (c1 * arr2 + c2 * arr3 * c3))

c1, c2, and c3 are constants,

arr1 = dynamically allocated, double precision, complex 1-dimensional array, and

arr2 and arr3 = dynamically allocated, double precision 1-dimensional arrays

The arrays are fairly large in the case where I received the overflow problem - all arrays are dimensioned from 0 to 65536 (2^16). I assume the problem is with the CDEXP function, but with the 2015 compiler, this case worked just fine and I didn't receive a stack overflow problem.

Also, I didn't need to increase stack size previously or use the -heap arrays compiler option (which isn't an option for me anyway as I need to create thread-safe libraries).

Breaking up the statement into a DO loop will be bulletproof, but defeats the efficiency of vectorization.

Anyone else have this problem?

Amalia_B_ · ‎08-26-2015

Forgot to add - QI is also a constant in the above Fortran line.

Steven_L_Intel1 · ‎08-26-2015

You can use /heap-arrays in thread-safe code. But otherwise try setting the Linker > System > Stack Reserve Size value larger. Start with 10000000 (10 million) and work up from there.

If you think there is a compiler problem, we'd need a complete test case we can build and run.

Amalia_B_ · ‎08-26-2015

OK, I was able to duplicate the problem with the code below (used as two .f90 files - 1 main program and 1 module). The stack overflow problem as it turns out is not in the CDEXP line but in the call to the subroutine. For whatever reason, there's a problem when passing the large array back through the derived type. BTW, to confirm, this code was also run on IVF 2015 and it works like it should. All default options were used in making the QWin project (Debug mode, 32-bit).

File testmod.f90:

module testmod
   type testtype
      real(kind=8), dimension(:), allocatable    :: arr2, arr3
      complex(kind=8), dimension(:), allocatable :: arr1
      real(kind=8) :: c1, c2, c3
      complex(kind=8) :: qi
      integer(kind=4) :: n
   end type testtype
end module testmod

File test.f90

program test
   use testmod
   implicit none  
   type(testtype) :: tp   
   tp%n = 2**16
   tp%qi = cmplx(0.d0, 1.0d0, 8)  
   tp%c1 = 198.d0
   tp%c2 = 314.6d0
   tp%c3 = 1.d0
   allocate( tp%arr1(0:tp%n), source = (0.d0, 0.d0) )
   allocate( tp%arr2(0:tp%n), source = 0.d0 )
   allocate( tp%arr3(0:tp%n), source = 0.d0 )  
   call testsub(tp)
   write(*,*) tp%arr1(0), tp%arr1(tp%n)  
end program test   
   
subroutine testsub( tp )
   use testmod
   implicit none  
   type(testtype), intent(inout) :: tp
   integer(kind=4) :: j 
   tp%arr2 = 150.d0 + [(j, j=0,tp%n)] * 1.d-3
   tp%arr3 = [(j, j=0,tp%n)] * 1.d0
   tp%arr1 = cdexp(tp%QI * (tp%c1 * tp%arr2 + tp%c2 * tp%arr3 * tp%c3))
end subroutine testsub

BTW, if you remove the subroutine TESTSUB, remove the derived type, and just put the equivalent code in the main program then it all works fine in IVF 2016.

Steven_L_Intel1 · ‎08-26-2015

Thanks - we'll check this out to see what is happening.

Steven_L_Intel1 · ‎08-28-2015

What is happening is that the 16.0 compiler is generating a stack-based temporary copy in order to evaluate the assignment to tp%arr1, where the 15.0 compiler did not. Not only does this trigger a stack overflow (in your example, the temp is just over 1MB and 1MB is how much the linker sizes the stack as by default), but it also slows down performance.

I have escalated this to development as issue DPD200375507 and will let you know of any progress. To remove the error, you can set the project property Linker > System > Stack Reserve to a larger value (your test case works fine with 2000000 but your actual program might need more), or you can set Fortran > Optimization > Heap Arrays > 0, in which case you don't need to worry about the size. I note that the other two assignments need stack to hold the array constructor.

Tobias_Loew · ‎02-08-2016

Hi,

I also did regression tests with IF 2016 (former was IF 2013). I found a routine with went up from 11K of stack space to 3M (sic!) (both in debug and release build) and now just crashes my program. Since I'm building dlls for customers, increasing the standard total heap size is not an option (and IMO generally a bad idea). So, I tried "Fortran > Optimization > Heap Arrays > 0" but it had no effect on the generated code (before the Fortran > Optimization > Heap Arrays options was just blank).

best regards

Tobias

Amalia_B_ · ‎02-22-2016

Steve,

Just installed Intel® Parallel Studio XE Composer Edition for Fortran Windows* 2016, Update 2 and it still has the same "stack overflow" problem. Is there any timeline of when this will be fixed?

Amalia

Steven_L_Intel1 · ‎02-23-2016

Sorry, I missed the update on this one. It has been fixed for the next major version, planned for the second half of this year. There aren't plans to fix it in an update to the 2016 product.

Enabling heap arrays should work - it does in your test case. Can you show an example where it does not? I consider this a preferable solution overall.

By the way, you may have noticed that your forum ID shows as "(Name withheld)". This happens if you have not set a forum "Display Name" or if the display name is an email address. You can go to your Dashboard (click on your name in the upper right) and change it there.