Certain OpenMP* WORKSHARE

s8ngsu3 · ‎04-15-2005

Is the WORKSHARE construct implemented in Fortran compiler for Linux 8.1? If not, how to parallelize the Fortran constructs where there are no explicit loops, like, e.g., 'forall', 'where', 'matmul',
'dot_product' etc. ?

Steven_L_Intel1 · ‎04-15-2005

WORKSHARE is coming in version 9.0, to be released later this year.

s8ngsu3 · ‎04-18-2005

It's nice to hear WORKSHARE is coming in a later release, but for the time being, what is the recommended way to parallelize vector operations (i.e. the ones that operate on whole arrays or array slices)?

holland · ‎06-26-2005

I am trying to parallelize some simple Fortran-90 array-syntax code, for example, adding two arrays:

A(:,:) = B(:,:) + C(:,:)

by placing OpenMP directives around this code as:

!$OMP PARALLEL

!$OMP WORKSHARE

A(:,:) = B(:,:) + C(:,:)

!$OMP END WORKSHARE

!$OMP END PARALLEL

I compile and test my code 'test.f' on an SGI-altix platformusing the most recent ifort compiler (~ version 9, release 13). My compile statement is 'ifort -O3 -openmp test.f'.

When I run my test program I get no speed-up-benefit as I increase the number of processors. I change the number of processors via the 'setenv OMP_NUM_THREADS ...' command.

However, if I write my code in do-loop style I do get speed up. For example,

!$OMP PARALLEL

!$OMP DO PRIVATE (A,B,C)

DO I=1,N

DO J=1,N

A(I,J) = B(I,J) + C(I,J)

ENDDO

!$OMP END DO

!$OMP END PARALLEL

I would appreciate any help you or insight you can offer (since all of my code is written in array-syntax form and non in do-loop style).

Cheers, David

TimP · ‎06-26-2005

If the compiler didn't report parallelization of this loop, I'll guess it relates to the documented preference for rank 1 arrays with WORKSHARE. It might be interesting to know what diagnostic you get with openmp_report.

Did the compiler successfully interchange loops in order to optimize your DO loop version with load-pair, as you would like it to do even when you don't apply OpenMP (assuming the I loop is fairly long)? Does it produce load-pair code for the rank 2 array version?

OpenMP parallelization may be pointless, if the loop isn't optimized prior to parallelization. Unless your point is to minimize serial performance in order to improve parallel scaling. If you don't nest these loops properly, in addition to cutting inner loop performance, your OpenMP parallelization could peak early, if you reach the point of false sharing, due to multiple threads operating frequently on the same cache line.

I'll agree that I'd like to see full optimization of array syntax, but OpenMP is somewhat of a low level programming scheme, which doesn't do well when details are left to the intelligence of the compiler, scheduler, and run-time library.

pbkenned1 · ‎08-28-2014

Certain OpenMP* WORKSHARE constructs now parallelize with Intel® Fortran Compiler 15.0. Our implementation is described here.

Patrick

TimP · ‎08-28-2014

It looks easier to remember to use f77 code than to figure out how to work with these restrictions.

OpenMP and !$OMP WORKSHARE