Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP and !$OMP WORKSHARE

s8ngsu3
Beginner
1,705 Views
Is the WORKSHARE construct implemented in Fortran compiler for Linux 8.1? If not, how to parallelize the Fortran constructs where there are no explicit loops, like, e.g., 'forall', 'where', 'matmul',
'dot_product' etc. ?
0 Kudos
6 Replies
Steven_L_Intel1
Employee
1,705 Views
WORKSHARE is coming in version 9.0, to be released later this year.
0 Kudos
s8ngsu3
Beginner
1,705 Views
It's nice to hear WORKSHARE is coming in a later release, but for the time being, what is the recommended way to parallelize vector operations (i.e. the ones that operate on whole arrays or array slices)?
0 Kudos
holland
Beginner
1,705 Views

I am trying to parallelize some simple Fortran-90 array-syntax code, for example, adding two arrays:

A(:,:) = B(:,:) + C(:,:)

by placing OpenMP directives around this code as:

!$OMP PARALLEL

!$OMP WORKSHARE

A(:,:) = B(:,:) + C(:,:)

!$OMP END WORKSHARE

!$OMP END PARALLEL

I compile and test my code 'test.f' on an SGI-altix platformusing the most recent ifort compiler (~ version 9, release 13). My compile statement is 'ifort -O3 -openmp test.f'.

When I run my test program I get no speed-up-benefit as I increase the number of processors. I change the number of processors via the 'setenv OMP_NUM_THREADS ...' command.

However, if I write my code in do-loop style I do get speed up. For example,

!$OMP PARALLEL

!$OMP DO PRIVATE (A,B,C)

DO I=1,N

DO J=1,N

A(I,J) = B(I,J) + C(I,J)

ENDDO

ENDDO

!$OMP END DO

!$OMP END PARALLEL

I would appreciate any help you or insight you can offer (since all of my code is written in array-syntax form and non in do-loop style).

Cheers, David

0 Kudos
TimP
Honored Contributor III
1,705 Views
If the compiler didn't report parallelization of this loop, I'll guess it relates to the documented preference for rank 1 arrays with WORKSHARE. It might be interesting to know what diagnostic you get with openmp_report.

Did the compiler successfully interchange loops in order to optimize your DO loop version with load-pair, as you would like it to do even when you don't apply OpenMP (assuming the I loop is fairly long)? Does it produce load-pair code for the rank 2 array version?

OpenMP parallelization may be pointless, if the loop isn't optimized prior to parallelization. Unless your point is to minimize serial performance in order to improve parallel scaling. If you don't nest these loops properly, in addition to cutting inner loop performance, your OpenMP parallelization could peak early, if you reach the point of false sharing, due to multiple threads operating frequently on the same cache line.

I'll agree that I'd like to see full optimization of array syntax, but OpenMP is somewhat of a low level programming scheme, which doesn't do well when details are left to the intelligence of the compiler, scheduler, and run-time library.
0 Kudos
pbkenned1
Employee
1,705 Views

Certain OpenMP* WORKSHARE constructs now parallelize with Intel® Fortran Compiler 15.0. Our implementation is described here.

Patrick

0 Kudos
TimP
Honored Contributor III
1,705 Views

It looks easier to remember to use f77 code than to figure out how to work with these restrictions.

0 Kudos
Reply