segfault with large array initialization

wim_van_hoydonck1 · ‎01-05-2009

Hi there,

A small test program seems to segfault on thefollowing line:
angles = pi*real( [(i,i=0,n-1)]/n, kind=dp)
for n larger than roughly 2000000.

ulimit is set to unlimited, ifort version 11.0 20081105 on fedora 10 and ifort version 10.1 20080312 on openSUSE 10.2, I've also tried it with using big integers, but it still segfaults. Replacing the above line by a 'classic loop' removes the segfault.

Before I openan issue about this segfault at premier.intel.com, I would like to know if I'm not doing somethingincredibly stupid here.

Greetings,

Wim

$ cat fortran_sin.f90
program fortran_sin
implicit none
integer,parameter :: dp = selected_real_kind(p=13,r=300)
integer,parameter :: n = 10000000
integer :: i
real(dp) , parameter :: pi = 3.1415926535897932385_dp
real(dp) :: angles(n),sins(n)
angles = pi*real( [(i,i=0,n-1)]/n , kind=dp ) !this line gives segfaults for big values of n
! do i=0,n-1
! angles(i+1) = pi*real(i/n,kind=dp)
! end do
sins = sin(angles)
end program fortran_sin

Ron_Green · ‎01-05-2009

Quoting - w.r.m.vanhoydonck@tudelft.nl

Hi there,

A small test program seems to segfault on thefollowing line:
angles = pi*real( [(i,i=0,n-1)]/n, kind=dp)
for n larger than roughly 2000000.

ulimit is set to unlimited, ifort version 11.0 20081105 on fedora 10 and ifort version 10.1 20080312 on openSUSE 10.2, I've also tried it with using big integers, but it still segfaults. Replacing the above line by a 'classic loop' removes the segfault.

Before I openan issue about this segfault at premier.intel.com, I would like to know if I'm not doing somethingincredibly stupid here.

Greetings,

Wim

$ cat fortran_sin.f90
program fortran_sin
implicit none
integer,parameter :: dp = selected_real_kind(p=13,r=300)
integer,parameter :: n = 10000000
integer :: i
real(dp) , parameter :: pi = 3.1415926535897932385_dp
real(dp) :: angles(n),sins(n)
angles = pi*real( [(i,i=0,n-1)]/n , kind=dp ) !this line gives segfaults for big values of n
! do i=0,n-1
! angles(i+1) = pi*real(i/n,kind=dp)
! end do
sins = sin(angles)
end program fortran_sin

The problem is that you are generating a HUGE array temporary for the expression on the right hand side. By default, this will be built on the stack. You can use

-heap-arrays

to get this to use heap instead of stack and avoid the issue. But is informative to know that you are creating a huge array temporary and perhaps this should be avoided.

wim_van_hoydonck1 · ‎01-05-2009

Quoting - Ronald Green (Intel)

The problem is that you are generating a HUGE array temporary for the expression on the right hand side. By default, this will be built on the stack. You can use

-heap-arrays

to get this to use heap instead of stack and avoid the issue. But is informative to know that you are creating a huge array temporary and perhaps this should be avoided.

Ah thanks, I knew I overlooked something simple.

It would indeed be informative to know when a program is creating large array on the stack.

Thanks,

Wim

Ron_Green · ‎01-06-2009

Yes, there is an outstanding feature request to provide information on when array temporaries are created. There is an existing option to warn when array temps are created during the passing of arguments to procedures:

-check arg_temp_created

This one is useful since in many cases these argument temporaries can be removed by using proper INTERFACEs for the procedures. ( -gen-interfaces is a really powerful feature that not enough users take advantage of)

In general, there are cases where the compiler is fully justified to use temporaries - it is often the most efficient and straightforward way to perform the operation. For example, in your expression:

angles = pi*real( [(i,i=0,n-1)]/n , kind=dp )

One way of doing this without the temp is to take i=0, divide by n, convert to real, mult by pi, store to angles(1). Then do the same for i=1, then i=2. This is not a very efficient way to implement this expression. Modern processors are highly pipelined, and you can imagine that this sequence of operations is not utilizing any streaming of memory, any pipelining in the FP hardware.

Now, imagine you create the entire array of I values. You stream that through the FP units to do the DIV operation, then stream through the REAL conversion, then stream through the pi* operation and as the pi*X results come out, you stream them out to consecutive memory for ANGLES.

So array temporaries do have their place, and are often exactly what you want.

We provide the -array-temps to use in cases where the user data is quite large and will not fit on stack. Stack is nice to use, since memory management is fast and efficient (push/pop, nothing more simple and efficient as this). Heap, there is a little more overhead as the runtime must manage the heap space to prevent fragmentation and exhaustion. We often debate whether it is better to default to heap temporaries (as many other compilers do) so that we no longer see the error you and others encounter with large arrays - and provide an option like "-stack-arrays" as a non-default option for performance critical applications. In general, the Intel philosophy is to default to speed and efficiency and this is an ideal example of this design philosophy.

Again, we do have the request to consider an option to warn the user on all temporary creation, and we continue to debate the merits of heap temps as default. This is an interesting topic.

ron

jimdempseyatthecove · ‎01-06-2009

Infomative post Ron,

While you are making improvements to array temporaries I would like to see a /warn:array_temporaries so I can see a compiler report. I find the run-time report ineffectual for program development. The run-time report may be suitable during profiling but having a compile time report could nip the problem in the bud.

Jim Dempsey

Hirchert__Kurt_W · ‎01-07-2009

As long as we're talking about possible future development directions, allow me to suggest that for a statement like

angles = pi*real( [(i,i=0,n-1)]/n , kind=dp )

the code I would like to see generated would neither have no array temporary nor would it have a full-length array temporary like the code currently being generated. I would like to see it generate code something roughly equivalent to

istep=1024 ! the optimal size of this step may be subject to debate
do istart=0,n-1,istep
istop=min(istart+istep,n)-1
angles(istart+1:istop+1) = pi*real( [(i,i=istart,istop)]/n , kind=dp )
end do

This would limit the size of the array temporary to something much less likely to cause a segfault or overflow the stack, while getting most of the benefit of using the highly-pipelined processor. This is the kind of loop "chunking" that is done for processors with array registers, so there's plenty of literature on doing this kind of code generation, but I have no idea whether the current Intel development team has any experience in this area.

-Kurt

JVanB · ‎01-07-2009

Quoting - w.r.m.vanhoydonck@tudelft.nl

angles = pi*real( [(i,i=0,n-1)]/n, kind=dp)

Simpler is:

[cpp]   angles = 0[/cpp]