Solved: Re: Merging OMP TARGET regions

cu238 · ‎09-01-2024

Code A below has 4 OMP TARGET regions.

!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_W
      DO K=1,KBM1
          I=NIAGCW(II)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET
      
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_E
          DO K=1,KBM1
          I=NIAGCE(II)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET 
      
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_N
      DO K=1,KBM1
          I=NIAGCN(II)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET   
      
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_S
      DO K=1,KBM1
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET

In order to make the computation faster, I merged the 4 into 1 in Code B below.

      NCOMB_1=N_S
      NCOMB_2=NCOMB_1+N_N
      NCOMB_3=NCOMB_2+N_E
      NCOMB_4=NCOMB_3+N_W
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,NCOMB_4
      DO K=1,KBM1
       IF (II>NCOMB_3) THEN
          I=NIAGCW(II-NCOMB_3)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
       ELSEIF (II>NCOMB_2) THEN
          I=NIAGCE(II-NCOMB_2)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
       ELSEIF (II>NCOMB_1) THEN
          I=NIAGCN(II-NCOMB_1)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
       ELSE
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
       ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET

Is there any better way to merge them? I wish the code could be less changed.

Thanks!

TobiasK · ‎09-05-2024

@cu238
If those loops can be executed in parallel, then just open a target teams region once and add the distribute parallel do collapse(2) clause to each of the loop nests. There is no synchronization after a distributed loop unlike after a parallel do loop.

!$OMP TARGET teams DEFAULTMAP (present: allocatable)
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_W
      DO K=1,KBM1
          I=NIAGCW(II)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_E
          DO K=1,KBM1
          I=NIAGCE(II)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_N
      DO K=1,KBM1
          I=NIAGCN(II)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_S
      DO K=1,KBM1
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET teams

View solution in original post

jimdempseyatthecove · ‎09-01-2024

Question: While I may differ in each of the enclosed loops, will the contents of NIP1(I), AIJ(I,6:10), NJP1(I) of any one loop have the same value as any other loop's NIP1(I), AIJ(I,6:10), NJP1(I)?

If yes, then you will have loop order dependencies.

Jim Dempsey

cu238 · ‎09-04-2024

They will never have the same value.

TobiasK · ‎09-05-2024

@cu238
If those loops can be executed in parallel, then just open a target teams region once and add the distribute parallel do collapse(2) clause to each of the loop nests. There is no synchronization after a distributed loop unlike after a parallel do loop.

!$OMP TARGET teams DEFAULTMAP (present: allocatable)
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_W
      DO K=1,KBM1
          I=NIAGCW(II)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_E
          DO K=1,KBM1
          I=NIAGCE(II)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_N
      DO K=1,KBM1
          I=NIAGCN(II)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_S
      DO K=1,KBM1
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET teams

cu238 · ‎09-07-2024

Great! It's just what I want. I tested it and the result is correct. Many thanks for your guidance.

jimdempseyatthecove · ‎09-05-2024

@TobiasK

If an array is allocated to a pointer, would you still use:

!$OMP TARGET DEFAULTMAP teams(present: allocatable)

or something else?

Jim Dempsey

TobiasK · ‎09-05-2024

@jimdempseyatthecove
To be honest, I never used that, however, the defaultmap also lists pointer as an option, so I guess there is a differentiation between pointer and allocatable:

Page 161:
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

jimdempseyatthecove · ‎09-05-2024

@TobiasK

Page 287:288 indicates a pointer that is associated (I assume allocated too), page 162 has:

The pointer variable-category specifies variables of pointer type.

but the question is, if I use ... teams(present: pointer), and the pointer is associated/allocated, is the pointer itself copied/mapped or is that which it points to?

Jim Dempsey

TobiasK · ‎09-05-2024

@jimdempseyatthecove the present clause should fail if the variable it not present on the device environment, it does not do any mapping.

For mapping of Fortran pointers, I have to check again with the development team. (If you don't need the pointer or allocatable attribute just hide it from the target region...)