Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Question about forall and where

tropfen
New Contributor I
711 Views

Hello,

- FORALL and WHERE are posibillities to allow the compiler to execute inside statements in any order on more then one processor

--- is the optimitation standard in ivf (in both debug and release) or do i have to set specific compiler options?

- the following example is showing a procedure i would like to optimize


[fortran]real,dimension (200,200) :: rArray,rArray1,rArray2
integer :: ix,iy

rArray=-999.0
do ix=1,200 do iy=1,200 if ((abs(rArray1(ix,iy)--999.0)>0.0001).and.(abs(rArray2(ix,iy)--999.0)>0.0001)) then rArray(ix,iy)=rArray1(ix,iy)+rArray2(ix,iy) end if end do ! iy end do ! ix[/fortran]

Is there a way?

As far as i have understand a IF construct is not allowed in FORALL and WHERE does only have one element.

Thanks in advance

Frank

0 Kudos
7 Replies
Steven_L_Intel1
Employee
711 Views

The original theory was that, yes, FORALL and WHERE, which are fancy array assignments, could allow parallelism. But the semantics were restrictive enough that it made it difficult to effectively paralleliize these constructs. As far as I know, Intel Fortran does not try to do so. (If it did, you would need /parallel and optimization.)

Fortran 2008 has added a "DO CONCURRENT" construct which is more straightforward to parallelize.

I suggest for now that you use OpenMP.

0 Kudos
TimP
Honored Contributor III
711 Views

Why not start by correcting the loop nesting?

"If" your objective is to write something equally as optimizable but without IF, you have available MERGE, expressible as a rank 2 operation, or rank 1 inside an OpenMP elgible loop, or even rank 0 inside 2 loops.

0 Kudos
TimP
Honored Contributor III
711 Views
Quoting tropfen

- the following example is showing a procedure i would like to optimize


[fortran]real,dimension (200,200) :: rArray,rArray1,rArray2
integer :: ix,iy

rArray=-999.0
do ix=1,200 do iy=1,200 if ((abs(rArray1(ix,iy)--999.0)>0.0001).and.(abs(rArray2(ix,iy)--999.0)>0.0001)) then rArray(ix,iy)=rArray1(ix,iy)+rArray2(ix,iy) end if end do ! iy end do ! ix[/fortran]

Is there a way?

Why not begin by correcting the loop nesting? Even if it doesn't immediately enable vectorization, it's essential to enabling local data chunks for parallelization by OpenMP or otherwise.
0 Kudos
j_clausen
Beginner
711 Views

DoesSteve's replymean that there is no advantages of using forall and where? I.e. I could just as well use standard do-loops and if-statements?

In my code I tend to use where and forall as much as I can, but maybe the only advantage is increased readability of the code for the trained eye.

Best Regards

j_clausen

0 Kudos
TimP
Honored Contributor III
711 Views

I do think that forall may be the most readable way where a mask is in use, except that ifort generally requires IVDEP directive to optimize a single assignment under masked forall, and does not optimize multiple assignments under mask. It took me years to recognize that forall with multiple assignments means that all operations under one assignment are completed before beginning the next. Is that what you mean by a trained eye? It's difficult for a compiler to optimize multiple assignments in a forall, so for example

forall(i = i:n)

a(i) = b(i) + c(i)

d(i) = a(i) + b(i)

end forall

is not as easily optimized as a similar looking DO, nor any different from

a(1:n) = b(1:n) + c(1:n)

d(1:n) = a(1:n) + b(1:n)

which is a direct expression of the meaning.

ifort has been improving quality of code generation under where, but it's not entirely settled from version to version. where...elsewhere, when relevant, appears to improve readability but worsen compiler behavior. where implies generation of a temporary mask vector and the re-use of it in the elsewhere.

0 Kudos
tropfen
New Contributor I
711 Views
Quoting tim18

I do think that forall may be the most readable way where a mask is in use, except that ifort generally requires IVDEP directive to optimize a single assignment under masked forall, and does not optimize multiple assignments under mask. It took me years to recognize that forall with multiple assignments means that all operations under one assignment are completed before beginning the next. Is that what you mean by a trained eye? It's difficult for a compiler to optimize multiple assignments in a forall, so for example

forall(i = i:n)

a(i) = b(i) + c(i)

d(i) = a(i) + b(i)

end forall

is not as easily optimized as a similar looking DO, nor any different from

a(1:n) = b(1:n) + c(1:n)

d(1:n) = a(1:n) + b(1:n)

which is a direct expression of the meaning.

ifort has been improving quality of code generation under where, but it's not entirely settled from version to version. where...elsewhere, when relevant, appears to improve readability but worsen compiler behavior. where implies generation of a temporary mask vector and the re-use of it in the elsewhere.

Hello,

i think the question was leading to an other direction.

What is faster:

[bash]integer :: i
real,dimension(100) :: f

do i=1,100
  f(i) = i*2
end do ! i

forall(i=1:100)
  f(i)=i*2
end forall[/bash]

And to improve the performance of one way which compiler options are additionally needed?


Frank

0 Kudos
TimP
Honored Contributor III
711 Views
Quoting tropfen
Quoting tim18

What is faster:

[bash]integer :: i
real,dimension(100) :: f

do i=1,100
  f(i) = i*2
end do ! i

forall(i=1:100)
  f(i)=i*2
end forall[/bash]

And to improve the performance of one way which compiler options are additionally needed?


You need a more complicated example before a choice of forall or DO should make any difference to performance, or for any compiler options beyond basic vectorization to matter. This loop isn't long enough for threaded parallel to work along with vectorization.
0 Kudos
Reply