Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

forall - segfault

albapa
Beginner
522 Views

Hi,

if I compile the code below, for great enough n I get a segfault. Increasing the stack size allows greater n. Is this behaviour normal?

program test_forall

implicit none

integer, parameter :: dp = kind(1.0d0)

real, dimension(:,:), allocatable :: y
integer :: i, j, n

read*,n

allocate( y(n,n))

y = 0.0_dp

forall(i=1:n,j=1:n,i>j) y(i,j) = y(j,i)

endprogram test_forall

0 Kudos
1 Solution
crtierney42
New Contributor I
522 Views
Quoting - albapa

Hi,

if I compile the code below, for great enough n I get a segfault. Increasing the stack size allows greater n. Is this behaviour normal?

program test_forall

implicit none

integer, parameter :: dp = kind(1.0d0)

real, dimension(:,:), allocatable :: y
integer :: i, j, n

read*,n

allocate( y(n,n))

y = 0.0_dp

forall(i=1:n,j=1:n,i>j) y(i,j) = y(j,i)

endprogram test_forall

It seems that arrays are allocated on the stack, which is why you have to increase your stacksize to get the code not to crash. You have two options:

1) Do what you did

2) Compile with -heap-arrays

See this link a better discussion of the problem:

http://software.intel.com/en-us/forums/showthread.php?t=57343

View solution in original post

0 Kudos
6 Replies
crtierney42
New Contributor I
523 Views
Quoting - albapa

Hi,

if I compile the code below, for great enough n I get a segfault. Increasing the stack size allows greater n. Is this behaviour normal?

program test_forall

implicit none

integer, parameter :: dp = kind(1.0d0)

real, dimension(:,:), allocatable :: y
integer :: i, j, n

read*,n

allocate( y(n,n))

y = 0.0_dp

forall(i=1:n,j=1:n,i>j) y(i,j) = y(j,i)

endprogram test_forall

It seems that arrays are allocated on the stack, which is why you have to increase your stacksize to get the code not to crash. You have two options:

1) Do what you did

2) Compile with -heap-arrays

See this link a better discussion of the problem:

http://software.intel.com/en-us/forums/showthread.php?t=57343

0 Kudos
TimP
Honored Contributor III
522 Views
Not everyone would consider it normal to replace TRANSPOSE with a FORALL, or to transpose a square matrix of all zeros. If you do decide to transpose a square matrix in place, you should bear in mind thatFORALL or TRANSPOSEprobably requires the compiler to allocate another temporary matrix of the same size, on stack by default, so that all the data are copied twice. You could ensure that it is done with minimum memory usage and without double copying by writing the operations out in f77.
0 Kudos
albapa
Beginner
522 Views
Quoting - tim18
Not everyone would consider it normal to replace TRANSPOSE with a FORALL, or to transpose a square matrix of all zeros. If you do decide to transpose a square matrix in place, you should bear in mind thatFORALL or TRANSPOSEprobably requires the compiler to allocate another temporary matrix of the same size, on stack by default, so that all the data are copied twice. You could ensure that it is done with minimum memory usage and without double copying by writing the operations out in f77.

Obviously, I don't care to transpose a zero matrix. And that's not transposing anyway, that's copying the upper triangular part of a matrix to the lower triangular part or vica versa - take another look at the forall command. This (actually the nested case below which shows the exact same error) is a valid example in the Metcalf book. I just tried to come up with the simplest example which shows this behaviour.

forall(i=1:n-1)

forall(j=i+1:n)

a(i,j)=a(j,i)

endforall

endforall

0 Kudos
albapa
Beginner
522 Views

Thanks, that works!

0 Kudos
TimP
Honored Contributor III
522 Views
Quoting - albapa

Thanks, that works!

However, ifort gives an apparently spurious LOOP WAS VECTORIZED indication.

It's still a slow way to go about it. If I invert the loops, in order to improve efficiency, ifort generates the same code with nested forall or do loops, except that it's possible to persuade it to auto-parallelize only the do loop version.

I was surprised to be reminded about this example in M&R. I've looked at that section before, hoping for indications about the relative merits of forall, but there is no such advice.

0 Kudos
Steven_L_Intel1
Employee
522 Views

There isn't much merit to FORALL. This is an HPF (High Performance Fortran) feature that got added to F95, but it is widely misunderstood and not well defined.The Fortran standards committee is developing better ways to express parallelism.

FORALL is not a loop construct. Theoretically, it's a "do independently" construct with the notion that the body can be done in parallel, but the restrictions are a bit too severe for that to work effectively. In this particular case, FORALL is not appropriate because there is data dependency between the different executions of the body.

Steve

0 Kudos
Reply