Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Merging OMP TARGET regions

cu238
Novice
388 Views

Code A below has 4 OMP TARGET regions.

!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_W
      DO K=1,KBM1
          I=NIAGCW(II)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET
      
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_E
          DO K=1,KBM1
          I=NIAGCE(II)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET 
      
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_N
      DO K=1,KBM1
          I=NIAGCN(II)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET   
      
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_S
      DO K=1,KBM1
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET  

 In order to make the computation faster, I merged the 4 into 1  in Code B below.

      NCOMB_1=N_S
      NCOMB_2=NCOMB_1+N_N
      NCOMB_3=NCOMB_2+N_E
      NCOMB_4=NCOMB_3+N_W
!$OMP TARGET DEFAULTMAP(present: allocatable)
!$OMP TEAMS DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,NCOMB_4
      DO K=1,KBM1
       IF (II>NCOMB_3) THEN
          I=NIAGCW(II-NCOMB_3)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
       ELSEIF (II>NCOMB_2) THEN
          I=NIAGCE(II-NCOMB_2)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
       ELSEIF (II>NCOMB_1) THEN
          I=NIAGCN(II-NCOMB_1)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
       ELSE
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
       ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET

Is there any better way to merge them? I wish the code could be less changed.

Thanks! 

1 Solution
TobiasK
Moderator
251 Views

@cu238 
If those loops can be executed in parallel, then just open a target teams region once and add the distribute parallel do collapse(2) clause to each of the loop nests. There is no synchronization after a distributed loop unlike after a parallel do loop.

 

 

!$OMP TARGET teams DEFAULTMAP (present: allocatable)
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_W
      DO K=1,KBM1
          I=NIAGCW(II)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_E
          DO K=1,KBM1
          I=NIAGCE(II)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_N
      DO K=1,KBM1
          I=NIAGCN(II)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_S
      DO K=1,KBM1
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET teams

 

 

 

View solution in original post

0 Kudos
8 Replies
jimdempseyatthecove
Honored Contributor III
341 Views

Question: While I may differ in each of the enclosed loops, will the contents of NIP1(I), AIJ(I,6:10), NJP1(I) of any one loop have the same value as any other loop's NIP1(I), AIJ(I,6:10), NJP1(I)?

If yes, then you will have loop order dependencies.

 

Jim Dempsey

0 Kudos
cu238
Novice
282 Views
0 Kudos
TobiasK
Moderator
252 Views

@cu238 
If those loops can be executed in parallel, then just open a target teams region once and add the distribute parallel do collapse(2) clause to each of the loop nests. There is no synchronization after a distributed loop unlike after a parallel do loop.

 

 

!$OMP TARGET teams DEFAULTMAP (present: allocatable)
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_W
      DO K=1,KBM1
          I=NIAGCW(II)
          IF (DUM(NIP1(I)).GT.0.0) THEN 
          XMFLUX(NIP1(I),K)=XMFLUX(AIJ(I,6),K)+XMFLUX(AIJ(I,7),K)
     *+XMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_E
          DO K=1,KBM1
          I=NIAGCE(II)
          IF (DUM(I).GT.0.0) THEN 
          XMFLUX(I,K)=XMFLUX(AIJ(I,9),K)+XMFLUX(AIJ(I,10),K)
     *+XMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_N
      DO K=1,KBM1
          I=NIAGCN(II)
          IF (DVM(I).GT.0.0) THEN 
          YMFLUX(I,K)=YMFLUX(AIJ(I,9),K)+YMFLUX(AIJ(I,10),K)
     *+YMFLUX(AIJ(I,11),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP DISTRIBUTE PARALLEL DO COLLAPSE(2)
      DO II=1,N_S
      DO K=1,KBM1
          I=NIAGCS(II)
          IF (DVM(NJP1(I)).GT.0.0) THEN 
          YMFLUX(NJP1(I),K)=YMFLUX(AIJ(I,6),K)+YMFLUX(AIJ(I,7),K)
     *+YMFLUX(AIJ(I,8),K)
          ENDIF
      ENDDO
      ENDDO
!$OMP END TARGET teams

 

 

 

0 Kudos
cu238
Novice
92 Views

Great! It's just what I want. I tested it and the result is correct. Many thanks for your guidance.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
240 Views

@TobiasK 

If an array is allocated to a pointer, would you still use:

!$OMP TARGET DEFAULTMAP teams(present: allocatable)

or something else?

 

Jim Dempsey

0 Kudos
TobiasK
Moderator
233 Views

@jimdempseyatthecove 
To be honest, I never used that, however, the defaultmap also lists pointer as an option, so I guess there is a differentiation between pointer and allocatable:

Page 161:
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

0 Kudos
jimdempseyatthecove
Honored Contributor III
222 Views

@TobiasK 

Page 287:288 indicates a pointer that is associated (I assume allocated too), page 162 has:

The pointer variable-category specifies variables of pointer type.

but the question is, if I use ... teams(present: pointer), and the pointer is associated/allocated, is the pointer itself copied/mapped or is that which it points to?

 

Jim Dempsey

0 Kudos
TobiasK
Moderator
192 Views

@jimdempseyatthecove the present clause should fail if the variable it not present on the device environment, it does not do any mapping.

For mapping of Fortran pointers, I have to check again with the development team. (If you don't need the pointer or allocatable attribute just hide it from the target region...)

0 Kudos
Reply