Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

Multithreading Big loop containing several loops inside

yafayez
Beginner
430 Views

Hi all,

Below please find my program. The program basically has a big loop and inside the big loop there are several loops that has to be executed in a certain way ( I put some remarks to show how it should be executed). Basically, every group of loops has to be exceuted fully " i.e all variables be updated before moving to the next loops". please let me know the best way and commands to use to ensure that the code is parallized only in the sequence showed in the code. Thanks,





do kk=1,temp The Big Do LOOP





do i=nx1+1,nx2-2
do j=ny1+2,ny2-2

stuff1
enddo
enddo




do i=nx1+2,nx2-2
do j=ny1+1,ny2-2

stuff2
enddo
enddo


/STUFF1 and STUFF2 can bw don in parellel but has to be completed before the below stuff




es0=0.


do i=nx1,nx2
do j=ny1,ny2-1
do k=nz1,nz2-1
stuff3
enddo
enddo
enddo



do i=nx1,nx2-1
do j=ny1,ny2
do k=nz1,nz2-1
stuff4
enddo
enddo
enddo





do i=nx1,nx2-1
do j=ny1,ny2-1
do k=nz1,nz2
stuff5
enddo
enddo
enddo


/STUFF3 and STUFF4 and STUFF5 can be done in parellel but has to be completed before the below stuff





call function1
call function2



c
c update the E_field
c
c Main
c





do j=ny1+1,ny2-1
do i=nx1,nx2-1
do k=nz1+1,nz2-1
STUFF6
enddo
enddo
enddo



do j=ny1,ny2-1
do i=nx1+1,nx2-1
do k=nz1+1,nz2-1
STUFF7
enddo
enddo
enddo



do j=ny1+1,ny2-1
do i=nx1+1,nx2-1
do k=nz1,nz2-1
STUFF8
enddo
enddo
enddo
/STUFF6 and STUFF7 and STUFF8 can be done in parellel but has to be completed before compiling the below stuff


call function3
call function4




enddo "ending the big do

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
430 Views


Is kk used inside your STUFF routines to select independent data sets?

If so, then can STUFF(...,kk) be executed in random order? (i.e. kk not dependent on kk-1)

Jim Dempsey

0 Kudos
yafayez
Beginner
430 Views

Thanks first for your reply. no, the stuff at kk cannot be executed in random order because values of iterations from k-1 pass to k and so on "i.e. they are dependent". Please let me know your thoughts on these. Thanks again,


Is kk used inside your STUFF routines to select independent data sets?

If so, then can STUFF(...,kk) be executed in random order? (i.e. kk not dependent on kk-1)

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
430 Views

You might start with something like the following:

!$omp parallel
!$omp do private(i,j)
do i=nx1+1,nx2-2
do j=ny1+2,ny2-2

stuff1
enddo
enddo
!$omp end do nowait

!$omp do private(i,j)
do i=nx1+2,nx2-2
do j=ny1+1,ny2-2

stuff2
enddo
enddo
!$omp end do nowait
!$omp end parallel

Note, depending on what is inside of stuff1 and stuff2 you may need to use different temporaries (or make subroutines out of stuff1 and stuff2 and passing i and j into the routines.

Also, if the computation overhead varies per iteration then experiment with adding the schedule clause.

Once you get the above working for stuff1 and stuff2 apply what you learned to the remaining stuff sections.

Jim Dempsey

0 Kudos
yafayez
Beginner
430 Views

Hi,

I added the commands and i see that all processors are working but the program is much slower. How can i detect bootlenick. Thanks, is there a phone number i can call you at to discuss it more. Thanks again,

Yasser

You might start with something like the following:

!$omp parallel
!$omp do private(i,j)
do i=nx1+1,nx2-2
do j=ny1+2,ny2-2

stuff1
enddo
enddo
!$omp end do nowait

!$omp do private(i,j)
do i=nx1+2,nx2-2
do j=ny1+1,ny2-2

stuff2
enddo
enddo
!$omp end do nowait
!$omp end parallel

Note, depending on what is inside of stuff1 and stuff2 you may need to use different temporaries (or make subroutines out of stuff1 and stuff2 and passing i and j into the routines.

Also, if the computation overhead varies per iteration then experiment with adding the schedule clause.

Once you get the above working for stuff1 and stuff2 apply what you learned to the remaining stuff sections.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
430 Views


email me at

j i m _ d e m p s e y @ a m e r i t e c h . n e t

(remove the spaces)

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
430 Views


>>I added the commands and i see that all processors are working but the program is much slower. How can i detect bootlenick.

This is symptomatic of the inner OpenMP do loop running serially. Place the "!$OMP DO ..." (or "C$OMP DO ..." at the left margine.

Should this not improve matters then try

!$omp parallel do private(i,j)
do i=nx1+1,nx2-2
do j=ny1+2,ny2-2

stuff1
enddo
enddo
!$omp end parallel do

!$omp parallel do private(i,j)
do i=nx1+2,nx2-2
do j=ny1+1,ny2-2

stuff2
enddo
enddo
!$omp end parallel do

Note, the above is not inside an !$OMP PARALLEL region

The purpose of coding the first way was to permit the threads finishing the STUFF1 loop first to begin processing the STUFF2 loop prior to the remaining threads working on STUFF1 loop finishing.

If this too does not improve the performance then the code in STUFF1 and STUFF2 are likely memory copy statements as opposed to computational statements.

Jim Dempsey

0 Kudos
Reply