- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the following piece of code that I would like to use (which does work but generates a compiler warning):
FORALL (ix=1:nCells)
FORALL (n=1:nMax) phiA(ix) = phiA(ix) + w(n) * psi(n,ix)
END FORALL
The warning is
All active cominations of index-names are not used within the variable being defined
(i.e. leftside) of this assignment-stmt. [PHIA]
The inner FORALL breaks the "No element of an array can be assigned a value more than once" rule. From what I understand, I must use DO loops in order to do the accumulating sum properly.
In almost all cases, nCells > nMas x (the only exceptions occur with fake debugging cases). The value of nMax is typically less than 100 and nCells can be in the thousands. What is the "best" way to code an accumulating sum that will enable the optimizer to maximize performance?
FORALL (ix=1:nCells)
FORALL (n=1:nMax) phiA(ix) = phiA(ix) + w(n) * psi(n,ix)
END FORALL
The warning is
All active cominations of index-names are not used within the variable being defined
(i.e. leftside) of this assignment-stmt. [PHIA]
The inner FORALL breaks the "No element of an array can be assigned a value more than once" rule. From what I understand, I must use DO loops in order to do the accumulating sum properly.
In almost all cases, nCells > nMas x (the only exceptions occur with fake debugging cases). The value of nMax is typically less than 100 and nCells can be in the thousands. What is the "best" way to code an accumulating sum that will enable the optimizer to maximize performance?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FORALL can be thought of, conceptually, as a "parallel DO". The rules are set up so that there is no interaction between executions of the FORALL body. The way you coded it, there is a dependency among all the invocations in the inner FORALL.
Your best bet is to code it simply in a DO loop. This will give the compiler the best opportunity to vectorize and otherwise optimize it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks as if you could replace the inner loop with DOT_PRODUCT(). Then, it may make little difference whether the outer loop is DO or FORALL, except that OpenMP parallel do would be available for use with DO. Why not use MATMUL, or ?GEMM from one of the optimized BLAS libraries such as MKL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I didn't even think about the BLAS call--thanks for the reminder

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page