Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

forall - segfault

albapa
初学者
1,019 次查看

Hi,

if I compile the code below, for great enough n I get a segfault. Increasing the stack size allows greater n. Is this behaviour normal?

program test_forall

implicit none

integer, parameter :: dp = kind(1.0d0)

real, dimension(:,:), allocatable :: y
integer :: i, j, n

read*,n

allocate( y(n,n))

y = 0.0_dp

forall(i=1:n,j=1:n,i>j) y(i,j) = y(j,i)

endprogram test_forall

0 项奖励
1 解答
crtierney42
新分销商 I
1,019 次查看
Quoting - albapa

Hi,

if I compile the code below, for great enough n I get a segfault. Increasing the stack size allows greater n. Is this behaviour normal?

program test_forall

implicit none

integer, parameter :: dp = kind(1.0d0)

real, dimension(:,:), allocatable :: y
integer :: i, j, n

read*,n

allocate( y(n,n))

y = 0.0_dp

forall(i=1:n,j=1:n,i>j) y(i,j) = y(j,i)

endprogram test_forall

It seems that arrays are allocated on the stack, which is why you have to increase your stacksize to get the code not to crash. You have two options:

1) Do what you did

2) Compile with -heap-arrays

See this link a better discussion of the problem:

http://software.intel.com/en-us/forums/showthread.php?t=57343

在原帖中查看解决方案

0 项奖励
6 回复数
crtierney42
新分销商 I
1,020 次查看
Quoting - albapa

Hi,

if I compile the code below, for great enough n I get a segfault. Increasing the stack size allows greater n. Is this behaviour normal?

program test_forall

implicit none

integer, parameter :: dp = kind(1.0d0)

real, dimension(:,:), allocatable :: y
integer :: i, j, n

read*,n

allocate( y(n,n))

y = 0.0_dp

forall(i=1:n,j=1:n,i>j) y(i,j) = y(j,i)

endprogram test_forall

It seems that arrays are allocated on the stack, which is why you have to increase your stacksize to get the code not to crash. You have two options:

1) Do what you did

2) Compile with -heap-arrays

See this link a better discussion of the problem:

http://software.intel.com/en-us/forums/showthread.php?t=57343

0 项奖励
TimP
名誉分销商 III
1,019 次查看
Not everyone would consider it normal to replace TRANSPOSE with a FORALL, or to transpose a square matrix of all zeros. If you do decide to transpose a square matrix in place, you should bear in mind thatFORALL or TRANSPOSEprobably requires the compiler to allocate another temporary matrix of the same size, on stack by default, so that all the data are copied twice. You could ensure that it is done with minimum memory usage and without double copying by writing the operations out in f77.
0 项奖励
albapa
初学者
1,019 次查看
Quoting - tim18
Not everyone would consider it normal to replace TRANSPOSE with a FORALL, or to transpose a square matrix of all zeros. If you do decide to transpose a square matrix in place, you should bear in mind thatFORALL or TRANSPOSEprobably requires the compiler to allocate another temporary matrix of the same size, on stack by default, so that all the data are copied twice. You could ensure that it is done with minimum memory usage and without double copying by writing the operations out in f77.

Obviously, I don't care to transpose a zero matrix. And that's not transposing anyway, that's copying the upper triangular part of a matrix to the lower triangular part or vica versa - take another look at the forall command. This (actually the nested case below which shows the exact same error) is a valid example in the Metcalf book. I just tried to come up with the simplest example which shows this behaviour.

forall(i=1:n-1)

forall(j=i+1:n)

a(i,j)=a(j,i)

endforall

endforall

0 项奖励
albapa
初学者
1,019 次查看

Thanks, that works!

0 项奖励
TimP
名誉分销商 III
1,019 次查看
Quoting - albapa

Thanks, that works!

However, ifort gives an apparently spurious LOOP WAS VECTORIZED indication.

It's still a slow way to go about it. If I invert the loops, in order to improve efficiency, ifort generates the same code with nested forall or do loops, except that it's possible to persuade it to auto-parallelize only the do loop version.

I was surprised to be reminded about this example in M&R. I've looked at that section before, hoping for indications about the relative merits of forall, but there is no such advice.

0 项奖励
Steven_L_Intel1
1,019 次查看

There isn't much merit to FORALL. This is an HPF (High Performance Fortran) feature that got added to F95, but it is widely misunderstood and not well defined.The Fortran standards committee is developing better ways to express parallelism.

FORALL is not a loop construct. Theoretically, it's a "do independently" construct with the notion that the body can be done in parallel, but the restrictions are a bit too severe for that to work effectively. In this particular case, FORALL is not appropriate because there is data dependency between the different executions of the body.

Steve

0 项奖励
回复