- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have problems with OpenMP WORKSHARE directive on ifort v. 9.0
(ifort -V produces:
[hajek@dell8 gpr]$ ifort -V
Intel Fortran Itanium Compiler for Itanium-based applications
Version 9.0 Build 20050624 Package ID: l_fc_c_9.0.024
Copyright (C) 1985-2005 Intel Corporation. All rights reserved.
)
If I compile the attached test program with
ifort -O3 -ip -openmp -openmp-report=2 omptest.f90
the compiler produces:
omptest.f90(30) : (col. 6) remark: OpenMP multithreaded code generation for SINGLE was successful.
omptest.f90(33) : (col. 6) remark: OpenMP multithreaded code generation for SINGLE was successful.
omptest.f90(29) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
There is only one SINGLE section in the parallel region, so my suspicion is that the compiler simply replaces the WORKSHARE region with SINGLE, an it is confirmed by the fact that I get no speedup for multiple threads:
[hajek@dell8 scratch]$ OMP_NUM_THREADS=4 ; export OMP_NUM_THREADS; time ./a.out
running two-dimensional version
OpenMP: using 4 threads.
0.000000000000000E+000 1.54969673800415
real 0m10.607s
user 0m5.595s
sys 0m5.513s
[hajek@dell8 scratch]$ OMP_NUM_THREADS=1 ; export OMP_NUM_THREADS; time ./a.out
running two-dimensional version
OpenMP: using 1 threads.
0.000000000000000E+000 1.54969673800415
real 0m10.716s
user 0m4.891s
sys 0m5.724s
I've searched the forum and only read that WORKSHARE might not work well with rank-2 arrays. But if I compile the previous with 1D version (-D ONED suffices), I get the same results.
I have a code with (relatively) large matrices assembling using forall statements and I really do not want to rewrite them as DO loops. Is WORKSHARE better supported in some later build?
I have problems with OpenMP WORKSHARE directive on ifort v. 9.0
(ifort -V produces:
[hajek@dell8 gpr]$ ifort -V
Intel Fortran Itanium Compiler for Itanium-based applications
Version 9.0 Build 20050624 Package ID: l_fc_c_9.0.024
Copyright (C) 1985-2005 Intel Corporation. All rights reserved.
)
If I compile the attached test program with
ifort -O3 -ip -openmp -openmp-report=2 omptest.f90
the compiler produces:
omptest.f90(30) : (col. 6) remark: OpenMP multithreaded code generation for SINGLE was successful.
omptest.f90(33) : (col. 6) remark: OpenMP multithreaded code generation for SINGLE was successful.
omptest.f90(29) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
There is only one SINGLE section in the parallel region, so my suspicion is that the compiler simply replaces the WORKSHARE region with SINGLE, an it is confirmed by the fact that I get no speedup for multiple threads:
[hajek@dell8 scratch]$ OMP_NUM_THREADS=4 ; export OMP_NUM_THREADS; time ./a.out
running two-dimensional version
OpenMP: using 4 threads.
0.000000000000000E+000 1.54969673800415
real 0m10.607s
user 0m5.595s
sys 0m5.513s
[hajek@dell8 scratch]$ OMP_NUM_THREADS=1 ; export OMP_NUM_THREADS; time ./a.out
running two-dimensional version
OpenMP: using 1 threads.
0.000000000000000E+000 1.54969673800415
real 0m10.716s
user 0m4.891s
sys 0m5.724s
I've searched the forum and only read that WORKSHARE might not work well with rank-2 arrays. But if I compile the previous with 1D version (-D ONED suffices), I get the same results.
I have a code with (relatively) large matrices assembling using forall statements and I really do not want to rewrite them as DO loops. Is WORKSHARE better supported in some later build?
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With OpenMP, plus opportunities for software pipelining, you generally have to be specific about which loop should be parallelized. If you are at all interested in performance, your first interest should be in enabling successful pipelining.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18 wrote:
With OpenMP, plus opportunities for software pipelining, you generally have to be specific about which loop should be parallelized. If you are at all interested in performance, your first interest should be in enabling successful pipelining.
This is indeed true, but compiling with
ifort -O3 -openmp -openmp-report=2 -opt-report -opt-report-phase=ecg_swp -opt-report-file=report.txt omptest.f90
shows that the forall statement in question really gets pipelined.
(see attachment)
Moreover, I tried to rewrite the forall statement as a DO loop in the following way:
do j=1,n
do i=1,n
if (i.ge.j) R(i,j) = sqrt(dot_product(theta,(X(:,i)-X(:,j))**2))
end do
end do
and it was parallelized and shown speedup by a factor of 2
(with 4 threads, but there are also other instructions)
So what am I doing wrong?
In Fortran 95 course at college we were encouraged to use forall's and where's wherever possible to inform the compiler about independency and help it optimizing and parallelizing the loop. Is it not true?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may have enabled an optimization
do j=1,n
do i=j,n
R(i,j) = sqrt(dot_product(theta,(X(:,i)-X(:,j))**2))
end do
end do
You certainly have clarified the task set to the parallelizing pre-processor and compiler, to pipeline the i loop and restrict any parallelization to the j loop.
do j=1,n
do i=j,n
R(i,j) = sqrt(dot_product(theta,(X(:,i)-X(:,j))**2))
end do
end do
You certainly have clarified the task set to the parallelizing pre-processor and compiler, to pipeline the i loop and restrict any parallelization to the j loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In other words, you would discourage the use of FORALL statements on such "totally independent" loops? And I used to trust FORALL be a great step towards portable parallel programming in Fortran... seems like having a good working implementation of WORKSHARE in Fortran 95 is a real challenge for a compiler. Well, I guess I'll return to good old DO loops and forget these F95 features that glitter but have no gold inside them.
![](/skins/images/8B5EA638CA3587CA763EE9EF53643DD4/responsive_peak/images/icon_anonymous_message.png)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page