Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

!$OMP PARALLEL - Progress at last

davidspurr
Beginner
496 Views
Follow up to "!DEC$ PARALLEL" thread (since method changed to OpenMP).

Thanks to help received in the above thread and to a report by Meloni et al (2003)*, I have now obtained a worthwhile improvement in execution time using OpenMP parallel coding (see "!DEC$ PARALLEL" thread).

For one inner loop parallel coded, the reduction in analysis time is now ~25% when using three threads (limited execution to 3 threads on 4 cores so the PC remains responsive). Method to achieve the improved performance was to hand code an equivalent "Reduction" process for the accumulation array (illustrated below). Interestingly Meloni et al found their hand coded version more efficient than using standard $OMP ... REDUCTION coding.

Given that the threaded loop likely contributed only ~50% of the original CPU time, the 25% reduction in total analysis time equates to a scale factor of around 2 (for 3 threads). If the loop was only 40% of the total workload, the scale factor is close to 3. The 12% improvement previously obtained equates to equivalent scale factors of 1.3 & 1.4.

I will parallel code a couple of other inner loops, but the returns will diminish until I shift the parallel coding to the outermost loop. That will require restructuring parts of the program.

Thanks
David


######
The code that worked (illustration only):

Earlier in the program:

nTd = 1
!$ nTd = MIN(OMP_GET_MAX_THREADS( ), 3)
!$ CALL OMP_SET_NUM_THREADS(nTd)
ALLOCATE( Temp(15, nL ,nTd)) !for agg

In the subject subroutine:
Temp = 0.d0
!$OMP PARALLEL PRIVATE(iTd) SHARE(....)
!$ iTd = OMP_GET_THREAD_NUM() + 1

!$OMP DO PRIVATE(....)
DO i = 1,LargeNum
k = kv(i)
...
DO j = 1,15
&n bsp; ....
x = xFUNCTION(....)
p = ... (depends on x, k & j)
Temp(j,iloc(k),iTd ) = Temp(j,iloc(k),iTd )+p
END DO
END DO
!$OMP END DO
!$OMP END PARALLEL

Res = SUM( Temp, DIM=3 )


* Meloni et al (2003), Reduction on arrays: comparison of performances among different algorithms, EWOMP03.
http://www.compunity.org/events/ewomp03/omptalks/Monday/Session3/T21p.pdf

0 Kudos
0 Replies
Reply