openmp reducation clause for ifort 10

amit-amritkar · ‎06-08-2009

Hi,

I tried using the reduction clause in a parallel region of an openmp enabled fortran code.
I get the following error even tough the variable_x is globally defined and is shared at the begining of the parallel region.

fortcom: Error: xxx.f, line 52: Variables that appear on the FIRSTPRIVATE, LASTPRIVATE, and REDUCTION clauses on a work-sharing directive must have shared scope in the enclosing region [variable_x].

Has anyone dealt with such problems in ifort 10?
Is it a complier bug?

Thanks,

Amit

jimdempseyatthecove · ‎06-08-2009

Quoting - amit

Hi,

I tried using the reduction clause in a parallel region of an openmp enabled fortran code.
I get the following error even tough the variable_x is globally defined and is shared at the begining of the parallel region.

fortcom: Error: xxx.f, line 52: Variables that appear on the FIRSTPRIVATE, LASTPRIVATE, and REDUCTION clauses on a work-sharing directive must have shared scope in the enclosing region [variable_x].

Has anyone dealt with such problems in ifort 10?
Is it a complier bug?

Thanks,

Amit

Amit,

Can you supply a code snipit where it is reduced to where/how variable_x is declared and all the !$OMP statements (include the loop control statement on parallel DO). We won't need the computational part of your loop.

Jim Dempsey

amit-amritkar · ‎06-08-2009

Quoting - jimdempseyatthecove

Amit,

Can you supply a code snipit where it is reduced to where/how variable_x is declared and all the !$OMP statements (include the loop control statement on parallel DO). We won't need the computational part of your loop.

Jim Dempsey

Jim,

Here is a dummy code. I call this dummy subroutine frequently from the main code. The variable_x is summ_temp. The variable summ_temp is decalred in a module. This is the parallel region implementation of the code. The code is formatted in f77 and I use fortran compiler 10.0
the compiler flags are -c -r8 -openmp -O3

module PARALLEL_REGION
real summ_temp
end module PARALLEL_REGION

program main_prog

USE PARALLEL_REGION

c$omp parallel private(m,i,j,k) shared(summ_temp)
call sum(v,total)
c$omp end parallel

end program main_prog

subroutine sum(phi,summ)

USE PARALLEL_REGION

real,dimension(:,:,:,:),intent(in)::phi
real summ
integer m

c$omp critical
summ_temp = 0
c$omp end critical

c$omp do private(m,i,j,k) reduction(+:summ_temp)
do m = 1, m_blk(id)
do k = k_b(m), k_e(m)
do j = j_b(m), j_e(m)
do i = i_b(m), i_e(m)
summ_temp = summ_temp + phi(i,j,k,m)
enddo
enddo
enddo
enddo
c$omp enddo

end subroutine sum

Thanks,
A~

TimP · ‎06-08-2009

Quoting - amit

Quoting - jimdempseyatthecove

Jim,

Here is a dummy code. I call this dummy subroutine frequently from the main code. The variable_x is summ_temp. The variable summ_temp is decalred in a module. This is the parallel region implementation of the code. The code is formatted in f77 and I use fortran compiler 10.0
the compiler flags are -c -r8 -openmp -O3

module PARALLEL_REGION
real summ_temp
end module PARALLEL_REGION

program main_prog

USE PARALLEL_REGION

c$omp parallel private(m,i,j,k) shared(summ_temp)
call sum(v,total)
c$omp end parallel

end program main_prog

subroutine sum(phi,summ)

USE PARALLEL_REGION

real,dimension(:,:,:,:),intent(in)::phi
real summ
integer m

c$omp critical
summ_temp = 0
c$omp end critical

c$omp do private(m,i,j,k) reduction(+:summ_temp)
do m = 1, m_blk(m)
do k = k_b(m), k_e(m)
do j = j_b(m), j_e(m)
do i = i_b(m), i_e(m)
summ_temp = summ_temp + phi(i,j,k,m)
enddo
enddo
enddo
enddo
c$omp enddo

end subroutine sum

Thanks,
A~

I question your use of the critical region which seems to duplicate functionality of sum reduction.
Did you mean to use nested parallel regions? It looks like you haven't carried the idea through, at least I'm not certain I see how you intended it to work.
Also, I suppose you could use a private sum variable for the inner 3 loops and add it into the sum reduction in the outer parallel loop.

amit-amritkar · ‎06-09-2009

Quoting - tim18

I question your use of the critical region which seems to duplicate functionality of sum reduction.
Did you mean to use nested parallel regions? It looks like you haven't carried the idea through, at least I'm not certain I see how you intended it to work.
Also, I suppose you could use a private sum variable for the inner 3 loops and add it into the sum reduction in the outer parallel loop.

Tim,
you are right. In this example case I don't need to use critical region.
I do not intent to use nested parallel regions.
The parallel directive is used in the main_prog which makes the subroutine call inside a parallel region and subsequently the do loop in the parallel region.
I just intend to calculate the sum of all the values of the 4 D array called 'phi' using the reduction clause across all the threads inside a parallel region using orphaned work sharing construct and ifort won't allow me to do that.
I did try a work around of updating a variable locally on each thread and then critically updating a global variable thus eliminating the reduction clause; this seems to work but slowly as it involves serial execution.

A~

jimdempseyatthecove · ‎06-09-2009

Try

[cpp]module PARALLEL_REGION
real summ_temp
end module PARALLEL_REGION


program main_prog

USE PARALLEL_REGION


C *** drop shared as it is meaningless (variable in module)
C *** drop private as m,i,j,k not in scope
C *** if total in module no change, else add shared(total)
C *** if v in module no change, else add shared(v)
c$omp parallel
call sum(v,total)
c$omp end parallel

end program main_prog


subroutine sum(phi,summ)

USE PARALLEL_REGION

real,dimension(:,:,:,:),intent(in)::phi
real summ
C *** add other integers
integer i,j,k,m

C *** done as initializer of reduction  $omp critical
C *** done as initializer of reduction  summ_temp = 0
C *** done as initializer of reduction  c$omp end critical

c$omp do private(m,i,j,k) reduction(+:summ_temp)
do m = 1, m_blk(id)
do k = k_b(m), k_e(m)
do j = j_b(m), j_e(m)
do i = i_b(m), i_e(m)
summ_temp = summ_temp + phi(i,j,k,m)
enddo
enddo
enddo
enddo
c$omp enddo

end subroutine sum
[/cpp]

Jim

amit-amritkar · ‎06-09-2009

Quoting - jimdempseyatthecove

Try

[cpp]module PARALLEL_REGION
real summ_temp
end module PARALLEL_REGION


program main_prog

USE PARALLEL_REGION


C *** drop shared as it is meaningless (variable in module)
C *** drop private as m,i,j,k not in scope
C *** if total in module no change, else add shared(total)
C *** if v in module no change, else add shared(v)
c$omp parallel
call sum(v,total)
c$omp end parallel

end program main_prog


subroutine sum(phi,summ)

USE PARALLEL_REGION

real,dimension(:,:,:,:),intent(in)::phi
real summ
C *** add other integers
integer i,j,k,m

C *** done as initializer of reduction  $omp critical
C *** done as initializer of reduction  summ_temp = 0
C *** done as initializer of reduction  c$omp end critical

c$omp do private(m,i,j,k) reduction(+:summ_temp)
do m = 1, m_blk(id)
do k = k_b(m), k_e(m)
do j = j_b(m), j_e(m)
do i = i_b(m), i_e(m)
summ_temp = summ_temp + phi(i,j,k,m)
enddo
enddo
enddo
enddo
c$omp enddo

end subroutine sum
[/cpp]

Jim

Jim,

I have tried the modifications that you suggested but still I can't get the code to compile and get the same error...
I am not sure if I understand the OpenMP implemenation correctly or if there is something wrong with the implementation itself in ifort.

Amit

TimP · ‎06-09-2009

Quoting - amit

I can't get the code to compile and get the same error...

Is this fragment the entire code? OpenMP can't compensate for undefined variables.

jimdempseyatthecove · ‎06-09-2009

[cpp] 
program main_prog 
 
 
c$omp parallel 
call sum(v,total) 
c$omp end parallel 
 
end program main_prog 
 
 
subroutine sum(phi,summ) 
 
real,dimension(:,:,:,:),intent(in)::phi 
real summ 
C *** add other integers 
integer i,j,k,m 
 
 
c$omp do private(m,i,j,k) reduction(+:summ) 
do m = 1, m_blk(id) 
do k = k_b(m), k_e(m) 
do j = j_b(m), j_e(m) 
do i = i_b(m), i_e(m) 
summ = summ + phi(i,j,k,m) 
enddo 
enddo 
enddo 
enddo 
c$omp enddo 
 
end subroutine sum 
[/cpp]

Jim

amit-amritkar · ‎06-09-2009

Quoting - tim18

Is this fragment the entire code? OpenMP can't compensate for undefined variables.

Tim,

No, this dummy fragment is just a very small part of the code and I don't have any undefined variables. For that matter the code runs fine when, reduction clause is eliminated and replaced by serial execution but that slows the execution considerably.

Amit

amit-amritkar · ‎06-09-2009

Quoting - jimdempseyatthecove

[cpp]
program main_prog

c$omp parallel
call sum(v,total)
c$omp end parallel

end program main_prog

subroutine sum(phi,summ)

real,dimension(:,:,:,:),intent(in)::phi
real summ
C *** add other integers
integer i,j,k,m

c$omp do private(m,i,j,k) reduction(+:summ)
do m = 1, m_blk(id)
do k = k_b(m), k_e(m)
do j = j_b(m), j_e(m)
do i = i_b(m), i_e(m)
summ = summ + phi(i,j,k,m)
enddo
enddo
enddo
enddo
c$omp enddo

end subroutine sum
[/cpp]

Jim

Jim,

Unfortunately, ifort does allow such compilation when the variable in the reduction clause 'summ' is locally defined, and it seems like incorrect implementation of OpenMP clause.Since in the parallel region the variable summ is local to each thread and there is no global memory location for the variable to get added across all the threads when using reduction clause.

Amit

jimdempseyatthecove · ‎06-09-2009

Quoting - amit

Jim,

Unfortunately, ifort does allow such compilation when the variable in the reduction clause 'summ' is locally defined, and it seems like incorrect implementation of OpenMP clause.Since in the parallel region the variable summ is local to each thread and there is no global memory location for the variable to get added across all the threads when using reduction clause.

Amit

summ is (was) the dummy argument in subroutine sum and which references total in the caller. total in the caller is either local variable, outside scope of c$omp parallel in progrma, or in module or common also outside scope of c$omp parallel.

Inside subroutine sum, the c$omp do should permit reduction clause on summ, making thread copy of summ on stack inside loop, then reducing to summ (total) on exit of loop.

I will test compile here in a few minutes.

Jim

jimdempseyatthecove · ‎06-09-2009

Quoting - jimdempseyatthecove

summ is (was) the dummy argument in subroutine sum and which references total in the caller. total in the caller is either local variable, outside scope of c$omp parallel in progrma, or in module or common also outside scope of c$omp parallel.

Inside subroutine sum, the c$omp do should permit reduction clause on summ, making thread copy of summ on stack inside loop, then reducing to summ (total) on exit of loop.

I will test compile here in a few minutes.

Jim

This compiles on my system

[cpp]module foo
integer :: m_blk(100), k_b(100), j_b(100), i_b(100)
integer ::             k_e(100), j_e(100), i_e(100)
end module foo

program main_prog  

    interface
        subroutine dosum(phi,summ)  
            real,dimension(:,:,:,:),intent(in)::phi  
            real summ  
        end subroutine
    end interface

    real, dimension(10,10,10,10) :: v
    real :: total

    v = 123.456

!$omp parallel  
    call dosum(v,total)  
!$omp end parallel  
  
end program main_prog  
  
  
subroutine dosum(phi,summ) 
    use foo 
    real,dimension(:,:,:,:),intent(in)::phi  
    real summ  
! *** add other integers  
    integer i,j,k,m  
  
  
!$omp do private(m,i,j,k) reduction(+:summ)  
    do m = 1, m_blk(id)  
        do k = k_b(m), k_e(m)  
            do j = j_b(m), j_e(m)  
                do i = i_b(m), i_e(m)  
                    summ = summ + phi(i,j,k,m)  
                enddo  
            enddo  
        enddo  
    enddo  
!$omp enddo  
  
end subroutine dosum  
[/cpp]

note, I had to rename subroutine, since SUM is an intrinsic function.
Jim

amit-amritkar · ‎06-10-2009

Quoting - jimdempseyatthecove

summ is (was) the dummy argument in subroutine sum and which references total in the caller. total in the caller is either local variable, outside scope of c$omp parallel in progrma, or in module or common also outside scope of c$omp parallel.

Inside subroutine sum, the c$omp do should permit reduction clause on summ, making thread copy of summ on stack inside loop, then reducing to summ (total) on exit of loop.

I will test compile here in a few minutes.

Jim

Sorry, I didn't mention this before as I just realized it, the problem with the implementation you showed is that, in the actual code, different variables are passed onto the subroutine thus the scope of variables that are passed to the subrouinte inside the parallel region varies between shared and private and is not fixed.
This is why in my code I am trying to introduce a differnet global variable.

Amit

jimdempseyatthecove · ‎06-10-2009

Quoting - amit

Sorry, I didn't mention this before as I just realized it, the problem with the implementation you showed is that, in the actual code, different variables are passed onto the subroutine thus the scope of variables that are passed to the subrouinte inside the parallel region varies between shared and private and is not fixed.
This is why in my code I am trying to introduce a differnet global variable.

Amit

I am not quite sure I understand.

Variation on prior post

[cpp]module foo
type phiType
    real, dimension(10,10,10,10) :: v
    integer :: m_blk(100), k_b(100), j_b(100), i_b(100)
    integer ::             k_e(100), j_e(100), i_e(100)
end type phiType

type(phiType) :: mod_phi
end module foo

program main_prog  
    use foo

    interface
        subroutine dosum(phi, summ)  
            use foo
            type(phiType) :: phi  
            real summ  
        end subroutine
    end interface
    
    type(phiType) :: local_phi
    real :: total


    mod_phi%v = 123.456
    local_phi%v = 987.654
    
!$omp parallel  
    call dosum(mod_phi,total)  
!$omp end parallel  

!$omp parallel  
    call dosum(local_phi,total)  
!$omp end parallel  
  
end program main_prog  
  
  
subroutine dosum(phi,summ) 
    use foo 
    type(phiType) :: phi  
    real summ  
! *** add other integers  
    integer i,j,k,m  
  
  
!$omp do private(m,i,j,k) reduction(+:summ)  
    do m = 1, phi%m_blk(id)  
        do k = phi%k_b(m), phi%k_e(m)  
            do j = phi%j_b(m), phi%j_e(m)  
                do i = phi%i_b(m), phi%i_e(m)  
                    summ = summ + phi%v(i,j,k,m)  
                enddo  
            enddo  
        enddo  
    enddo  
!$omp enddo  
  
end subroutine dosum  
[/cpp]

Note, The dimensions of the sample code above are not workable, I will let you fix that

Jim Dempsey

jimdempseyatthecove · ‎06-10-2009

Also, you can seperate the values array v from the bounds

subroutinedosum(values, bounds, result)

Jim Dempsey

amit-amritkar · ‎06-12-2009

Quoting - jimdempseyatthecove

summ is (was) the dummy argument in subroutine sum and which references total in the caller. total in the caller is either local variable, outside scope of c$omp parallel in progrma, or in module or common also outside scope of c$omp parallel.

Inside subroutine sum, the c$omp do should permit reduction clause on summ, making thread copy of summ on stack inside loop, then reducing to summ (total) on exit of loop.

I will test compile here in a few minutes.

Jim

This example compiles for me too.

When I try to do the same in my code, the code compiles but when I run the code I get inconsistent results.
That is for every repeated run with all the parameters remaining same (nothing at all changes), the reduction clause outputs different results.

I am not sure why this is happening. Is it because I have ifort 10.0?

Ron_Green · ‎06-12-2009

Quoting - amit

Quoting - jimdempseyatthecove

summ is (was) the dummy argument in subroutine sum and which references total in the caller. total in the caller is either local variable, outside scope of c$omp parallel in progrma, or in module or common also outside scope of c$omp parallel.

Inside subroutine sum, the c$omp do should permit reduction clause on summ, making thread copy of summ on stack inside loop, then reducing to summ (total) on exit of loop.

I will test compile here in a few minutes.

Jim

This example compiles for me too.

When I try to do the same in my code, the code compiles but when I run the code I get inconsistent results.
That is for every repeated run with all the parameters remaining same (nothing at all changes), the reduction clause outputs different results.

I am not sure why this is happening. Is it because I have ifort 10.0?

Very possible that 10.0 is causing the non-reproducibility. There was a change in the ifort 11.0 compiler that fixes the global stack address, which affects alignment of data. Linux allows the global stack starting address to vary for processes. There are 2 possible fixes: rebuilding a linux kernel after tweaking a kernel param OR having the compiler fix the global stack at a fixed address. Ifort 11.0 chose the second option.

Thus I highly encourage moving to ifort 11.0. Keep in mind, THIS MAY NOT FIX WHAT YOU ARE SEEING. There may be something else going on. But by moving to 11.0, you remove one free variable.

ron

amit-amritkar · ‎07-15-2009

I think that Intel compiler 11.0.084 has a problem with regards to reducation clause in OpenMP parallel region.

Consider the sample code below. Logically this should not compile but for Intel compiler it does. This seems like a bug to me? (I do set OMP_NUM_THREADS greater than 1)

[cpp]      module data

      real,dimension(:,:,:,:),allocatable :: phi

      end module data

      
     
      program test

      USE data
      real counter
      integer i,j,k,m
      
      allocate(phi(10,10,10,10))
      
      counter = 1.0

c$omp parallel
c$omp do private(i,j,k,m) 
      do m=1,10
      
      do k=1,10
      do j=1,10
      do i=1,10
      phi(i,j,k,m)=counter
      enddo
      enddo
      enddo
      enddo
c$omp enddo 
      
      call tester
      
c$omp end parallel
      
      end program test
     


      subroutine tester
      
      real summ_temp
      
      interface
      subroutine summation(summ)
      real summ
      end subroutine summation
      end interface
      
      summ_temp = 1.0

      call summation(summ_temp)

      write(*,*)'sum is:',summ_temp

      end subroutine tester



      subroutine summation(summ)

      USE data
      real summ

c$omp do private(i,j,k,m) 
c$omp+ reduction(+:summ)
      do m=1,10
      do k=1,10
      do j=1,10
      do i=1,10
      summ = summ + phi(i,j,k,m)
      enddo
      enddo
      enddo
      enddo
c$omp enddo

      end subroutine summation[/cpp]