Parallel Fortran Implementation

Petros — Thu, 07 Apr 2011 11:39:39 GMT

Hi,

I need some help parallelizing an implementation in fortran. I have a time-domain simulation program that we implemented in Fortran. The main structure is:

DO t=0,Nt !HAS TO BE IN SERIAL

SOME SERIAL CODE

DO I=1,N

CALL F(I)

END DO

SOME SERIAL CODE

DO I=1,N

CALL G(I)

END DO

SOME SERIAL CODE

END DO

The big DO has to be serial (it's the time incrementing). The inner DOs can be parallelized. N is usually big (8000-9000) and the F(), G(), ... functions are computationally intensive.

I tried parallelizing the internal DO-LOOPs with!$OMP PARALLEL DO directives. It runs ok (solved the data dependencies etc) BUT, it is a lot slower than running in serial!

-Is it because every time new threads are created and die and it indroduces overhead?

-Should I start with !$OMP PARALLEL in the beginning and use !$OMP DO-!$OMP END DO and protect the serial parts with !$OMP MASTER?

-If yes where should I put !$OMP PARALLEL? Before or after the big DO-LOOP (that needs to be serial)

Any suggestions?

Thanks people...

Parallel Fortran Implementation

jimdempseyatthecove — Thu, 07 Apr 2011 15:24:55 GMT

How you can best parallelize this code depends on some factors that cannot be explained in a simple sketch of your code as you did in your opening thread message.

1) How many sections of serial code do you have?
2) How many threads are available (i.e. what is the ratio of threads to serial sections)?
3) of the subroutines being called in your serial loops, which are dependent on which others?
4) of the subroutines that are dependend on (former) earlier serial loops, which are dependent on the same element and which are dependent on the entire earlier loop completing?
5) other issues.

Parallization options that depend on the questions above

! each subroutine elementaly wise independent of other elements
! but may be dependent on sequence A-Z
do t=0,nT
!$omp parallel do
do i=1,N
call A(i)
call B(i)
...
call Z(i)
!$omp end parallel do
end do

! each loop run in parallel but in sequence A, B, ... Z
do t=0,nT
!$omp parallel do
do i=1,N
call A(i)
end do
!$omp end parallel do
!$omp parallel do
do i=1,N
call B(i)
end do
!$omp end parallel do
...
!$omp parallel do
do i=1,N
call Z(i)
end do
!$omp end parallel do
end do

! each loop run by one thread
do t=0,nT
!$omp parallel sections private(i)
do i=1,N
call A(i)
end do
!$omp section
do i=1,N
call B(i)
end do
!$omp section
...
!$omp end parallel sections
end do

The first method established one team,and slices all loops. This reduces the number of team start/stops.
The second method (I assume is what you currently are doing) increased the number of team start/stops.
The third method established one team, one for each loop, the effectiveness will depend on the number of loops and amound of processsing for each loop.

A forth method could be a variation of method 3 where you reorder and do more than one loop within a section.

An additional optimization may be available if you note in your serial code the early-on subroutines run completely independent of the later-on subroutines. An example might be performing the physics computations up to the point where positional data is updated, and after positional infromation is updated, you call graphics routines to render the scene. In this situation you would code something like

!$omp parallel private(t, i) ! create team outside t loop
do t=0,nT
!$omp sections ! note removal of parallel
do i=1,N
call A(i)
end do
!$omp section
...
!$omp end sections
!$omp barrier
!$omp do
do i=1,N
call AdvancePosition(i)
end do
!$omp end do
!$omp master
call Render()
!$omp end master
end do
!$omp end parallel

*** untested sketch above

What the above provides is for the additional threads to begin the next physics calculations during the render process (assuming rendering is not parallel)

Hopefully this will give you some hints as to where you can go.

Note, add this last step _after_ you get everything else working at top speed.

Jim Dempsey

Parallel Fortran Implementation

Petros — Fri, 08 Apr 2011 11:46:01 GMT

Thanks! I'll try the variations and try to fine tune them! I'm currently doing the 2nd one!

topic Parallel Fortran Implementation in Intel® Moderncode for Parallel Architectures

Parallel Fortran Implementation

Parallel Fortran Implementation

Parallel Fortran Implementation