- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
'dot_product' etc. ?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to parallelize some simple Fortran-90 array-syntax code, for example, adding two arrays:
A(:,:) = B(:,:) + C(:,:)
by placing OpenMP directives around this code as:
!$OMP PARALLEL
!$OMP WORKSHARE
A(:,:) = B(:,:) + C(:,:)
!$OMP END WORKSHARE
!$OMP END PARALLEL
I compile and test my code 'test.f' on an SGI-altix platformusing the most recent ifort compiler (~ version 9, release 13). My compile statement is 'ifort -O3 -openmp test.f'.
When I run my test program I get no speed-up-benefit as I increase the number of processors. I change the number of processors via the 'setenv OMP_NUM_THREADS ...' command.
However, if I write my code in do-loop style I do get speed up. For example,
!$OMP PARALLEL
!$OMP DO PRIVATE (A,B,C)
DO I=1,N
DO J=1,N
A(I,J) = B(I,J) + C(I,J)
ENDDO
ENDDO
!$OMP END DO
!$OMP END PARALLEL
I would appreciate any help you or insight you can offer (since all of my code is written in array-syntax form and non in do-loop style).
Cheers, David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did the compiler successfully interchange loops in order to optimize your DO loop version with load-pair, as you would like it to do even when you don't apply OpenMP (assuming the I loop is fairly long)? Does it produce load-pair code for the rank 2 array version?
OpenMP parallelization may be pointless, if the loop isn't optimized prior to parallelization. Unless your point is to minimize serial performance in order to improve parallel scaling. If you don't nest these loops properly, in addition to cutting inner loop performance, your OpenMP parallelization could peak early, if you reach the point of false sharing, due to multiple threads operating frequently on the same cache line.
I'll agree that I'd like to see full optimization of array syntax, but OpenMP is somewhat of a low level programming scheme, which doesn't do well when details are left to the intelligence of the compiler, scheduler, and run-time library.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Certain OpenMP* WORKSHARE constructs now parallelize with Intel® Fortran Compiler 15.0. Our implementation is described here.
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks easier to remember to use f77 code than to figure out how to work with these restrictions.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page