passing variable to subroutine within parallel region

jordimp · ‎09-07-2009

Hi,

I am working on a Fortran90/95 code with OpenMP and I have a problem when I try to pass a variable into a subroutine within a parallel region. This may be a very typical question, but I have not been able to find an answer to it, at least not one that would suit my particular problem, so I hope someone here can illuminate me.

The essence of the code would look something like this:

[cpp]integer, parameter :: n=100
integer :: i
real*8 :: a(n),b(n),b2(n),x(n),zz(n)
real*8 :: c(2*n),d(2*n),d2(2*n),r(2*n),ss(2*n)
real*8 :: y,t

do i=1,n
    x(i)=i
enddo
do i=1,2*n
    r(i)=i*i
enddo

open(unit=1,file='input.dat')
read(1,*) (a(i),b(i),b2(i),i=1,n)
read(1,*) (c(i),d(i),d2(i),i=1,2n)
close(1)

!$OMP PARALLEL shared(a,b,n,x,zz), private(y)
do i=1,n
    call splint(a,b,b2,n,x(i),y)
    zz(i)=y
enddo
do i=1,2*n
    call splint(c,d,d2,2*n,r(i),t)
    ss(i)=t
enddo
!$OMP END PARALLEL

write(*,*) (zz(i),i=1,n)
write(*,*) (ss(i),i=1,2*n)

end

subroutine splint(a,b,b2,n,x,y)
! this is a standard subroutine to make splines
integer :: n
real*8 :: a(n),b(n),b2(n),x,y

      kl0=1
      khi=n
c     means of bisection
 1    IF(khi-kl0.gt.1) THEN
         k=(khi+kl0)/2
         IF(xa(k).gt.x) THEN
            khi=k
         ELSE
            kl0=k
         END IF
         GO TO 1
      END IF
c     KL0 and KHI now bracket the X
      h=xa(khi)-xa(kl0)

      IF(h.eq.0.0d0) STOP
      a=(xa(khi)-x)/h
c     evaluation of cubic spline polynomial
      b=(x-xa(kl0))/h
      y=a*ya(kl0)+b*ya(khi)+((a**3-a)*y2a(kl0)
     >     +(b**3-b)*y2a(khi))*(h**2)/6.d0

      RETURN
end subroutine splint[/cpp]

As one can imagine, I am trying to spline vector 'b' (that I know numerically on grid 'a') to the new grid values given by 'x'.

When I run the code, it seems to work fine some times, but at a certain point one of the calls to splint is done with something like n = -1539548 (in general, a large negative integer), so I would guess there has been an error accessing the memory where 'n' was expected.

My problem is that I don't know what attributes (SAVE, AUTOMATIC?) should I give or what clauses (PRIVATE, SHARED,...) I should use to avoid what I guess is a data racing condition in the arguments of splint.

I have seen some threads on similar problems, and a suggested solution was to put the dummy arguments of the subroutine in a COMMON block that was declared THREADPRIVATE. However, this would not seem to be practical in my case, as the subroutine splint is called many times in the actual code, and, as I tried to show in the code above, with different input arrays of different dimensions, so I don't know which ones I would have to put in the COMMON statement.

Thanks in advance for any help!

jimdempseyatthecove · ‎09-08-2009

In your subroutine, what do you expect of these statements?

a=(xa(khi)-x)/h
and
b=(x-xa(kl0))/h

Where the left side of = is a vector (array)and the right side is a scalar?

When the caller uses a parallel-ized loop to distribute work across n threads, wouldn't you expect (require) each thread to work with and/or set 1/n'th of each array?

Also, your final y= is producing a scalar from a blend of (total of) vectors and scalars

Jim Dempsey

jordimp · ‎09-08-2009

Hi and thanks for your help.

It seems I was not very successful at writing a 'reduced' code that showed my problem, so indeed it makes no sense as it's written in my post. The beginning of the subroutine should actually read:

[cpp]subroutine(xa,ya,y2a,n,x,y)
implicit real*8 (a-h,o-z)
integer :: n  
real*8 :: xa(n),ya(n),y2a(n)
[/cpp]

So that the input arrays are xa, ya and y2a, and then the scalars 'a' and 'b' within the subroutine are used to find the value of the interpolated point 'y' (again a scalar value). Sorry for the confusing names.

Jordi

jimdempseyatthecove · ‎09-08-2009

Try

[sectionBodyText]!$OMP PARALLEL shared(a,b,n,x,zz) , private(y,t)
!$OMP DO
do i=1,n 
    call splint(a,b,b2,n,x(i),y) 
    zz(i)=y 
enddo
!$OMP END DO
!$OMP DO
do i=1,2*n 
    call splint(c,d,d2,2*n,r(i),t) 
    ss(i)=t 
enddo 
!$OMP END DO
!$OMP END PARALLEL 

--------------- or ------------

!$OMP PARALLEL shared(a,b,n,x,zz)
!$OMP DO
do i=1,n 
    call splint(a,b,b2,n,x(i),zz(i)) 
enddo
!$OMP END DO
!$OMP DO
do i=1,2*n 
    call splint(c,d,d2,2*n,r(i),ss(i)) 
enddo 
!$OMP END DO
!$OMP END PARALLEL 

[/sectionBodyText]

Jim Dempsey

jordimp · ‎09-09-2009

Hi,

I have tried your suggestion but the results change from run to run, which would not be the best symptom.
I think this might be because I have different parallel regions within the code. Even though each region uses different data arrays, all of them call splint. Something like:

[cpp]n=100
a=...
b=...
x=...

!$OMP PARALLEL shared(a,b,n,x,zz)
!$OMP DO
do i=1,n
   call splint(a,b,n,x(i),zz(i))
enddo
!$OMP END DO
!$OMP END PARALLEL

ap=...
bp=...
xp=...

!$OMP PARALLEL shared(ap,bp,n,xp,zzp)
!$OMP DO
do j=1,n
   call splint(ap,bp,n,xp(j),zzp(j))
enddo
!$OMP END DO
!$OMP END PARALLEL[/cpp]

I guessed that, even if the different calls to splint receive different input data, they might conflict with each other if one thread was working on the upper parallel region when another thread is already in the second region. Could this be the case? That's why I was trying to find out how to 'transfer' the private attribute of the variables from the main code to the subroutine, so that different calls to splint are truly independent.

I have now implemented a code that seems to work, at least to the extent that each call to splint does give the expected sizes to its vectors. The way I do it is to give a firstprivate copy of the variables to each thread, so that I don't have to rely on different threads sharing the data:

!$OMP PARALLEL firstprivate(a,b,n,x), shared(zz)
!$OMP DO lastprivate(a,b,n,x)
do i=1,n
call splint(a,b,n,x(i),zz(i))
enddo
!$OMP END DO
!$OMP END PARALLEL

and analogously for the second loop.

I had to include the lastprivate clause in the DO loops, as otherwise the variables were undefined after the corresponding parallel region. This is probably not optimal, but seems to do the job (I am doing further checks on this).

As the data in the different parallel regions is (mostly) independent, maybe another option would be to have one big parallel region covering everything, and with shared(a,b,n,x,zz,ap,bp,np,xp,zzp,....). I initially discarded this option as I thought that the overhead of sharing all the data all the time would be larger than doing it in smaller bits in a dynamical way (in particular I have to share a large matrix with ~5 Gb of data which is also putting some stress on the memory). What would be the best option in this respect?

Thanks again!

jimdempseyatthecove · ‎09-10-2009

You do not need to use private, firstprivate, lastprivate on the read-only arrays passed to splint these need only be shared.

For the arrays that are written to (write-only or read-modify-write) .AND. are indexed by the !$OMP DO loop control variable (but .NOT. loop control variable +/- offset), these should be passed as shared. IOW multiple threads will not concurrently update same variable(s) within array.

Also, when compiling subroutines seperate from the OpenMP sections of your code, add the RECURSIVE attribute to the subroutine or function

recursive subroutine splint(...

This instructs that local arrays default to being on the stack.

Although, when compiling splint with -Qopenmp this too instructs that local arrays default to being on the stack.
However, when compiling splint seperately, say for a generic library, since it does not contain !$OMP directives you might mistakenly compile without -Qopenmp and thus end up with the default local arrays as SAVE.

Jim Dempsey

gilthe · ‎09-17-2009

Quoting - jimdempseyatthecove

You do not need to use private, firstprivate, lastprivate on the read-only arrays passed to splint these need only be shared.

For the arrays that are written to (write-only or read-modify-write) .AND. are indexed by the !$OMP DO loop control variable (but .NOT. loop control variable +/- offset), these should be passed as shared. IOW multiple threads will not concurrently update same variable(s) within array.

Also, when compiling subroutines seperate from the OpenMP sections of your code, add the RECURSIVE attribute to the subroutine or function

recursive subroutine splint(...

This instructs that local arrays default to being on the stack.

Although, when compiling splint with -Qopenmp this too instructs that local arrays default to being on the stack.
However, when compiling splint seperately, say for a generic library, since it does not contain !$OMP directives you might mistakenly compile without -Qopenmp and thus end up with the default local arrays as SAVE.

Jim Dempsey

On Linux, does specifying -openmp (using ifort 11.1.046 on Intel64) always imply to compile all subroutines with the RECURSIVE attribute?
Admittedly, I am just starting to openMP my code (biomolecular simulation software) but I found it confusing to find that adding the -recursive flag fixed a problem of the following type:

use somemodule
PARALLEL ...
call bla(a1,a2,a3)
if (a3 > something) somemodule_var = 1
END PARALLEL
process data (depends on somemodule_var of course)

where bla is external and defined:
subroutine bla(a1,a2,a3)
use someothermodules
local variables v1,v2,,...
a3 = someothermodule_var(a1)
a3 = a3 + somefunc(v1,v2,a3)
...

a1,a2,a3 are all explictily declared private (and initialized correctly). It seemed to me that race conditions occurred on local variables declared within subroutine bla but I struggle to understand why the openMP standard would define it that way? And what happens to variables declared in modules included from within the subroutine which is called by a (non-master) thread?

I am not a CS person so I struggle to find comprehensible answers on the web and apologies for the atrocious pseudo-code ...
Thanks!
Andreas

jimdempseyatthecove · ‎09-17-2009

Andreas,

I too share your sentiments.

In Fortran (barring switches or RECURSIVE attribute)...

Local scalar variables

integer ::I
real :: R

are stack local variables

**** HOWEVER ****

Locally declared arrays

integer :: IARRAY(n)
real :: RARRAY(n)

are subject to the whims of the compiler vendor as to if the array is (has equivilent to) SAVE or stack local (non-SAVE).

The -openmp makes these arrays stack local
RECURSIVE makes these arrays stack local
The Intel specific non-standard AUTOMATIC makes these arrays stack local

It would have be preferred (to me)if the Fortran standards committee adopted a formal "make this array stack local" where the declaration is made in the source code. An attribute of NOSAVE would have worked. Instead, they duck the issue by using RECURSIVE to imply that non-SAVE arrays are stack local. There really needs to be an explicit statement as to the disposition of arrays.

Jim Dempsey

gilthe · ‎09-18-2009

Quoting - jimdempseyatthecove

Andreas,

I too share your sentiments.

In Fortran (barring switches or RECURSIVE attribute)...

Local scalar variables

integer ::I
real :: R

are stack local variables

**** HOWEVER ****

Locally declared arrays

integer :: IARRAY(n)
real :: RARRAY(n)

are subject to the whims of the compiler vendor as to if the array is (has equivilent to) SAVE or stack local (non-SAVE).

The -openmp makes these arrays stack local
RECURSIVE makes these arrays stack local
The Intel specific non-standard AUTOMATIC makes these arrays stack local

It would have be preferred (to me)if the Fortran standards committee adopted a formal "make this array stack local" where the declaration is made in the source code. An attribute of NOSAVE would have worked. Instead, they duck the issue by using RECURSIVE to imply that non-SAVE arrays are stack local. There really needs to be an explicit statement as to the disposition of arrays.

Jim Dempsey

Thanks, Jim!

This will reveal my ignorance but can you (or anyone) clarify what happens memory-wise if within the scope of a larger program a subroutine is called that uses modules not included by the subroutine containing the call statement. generally, those modules may contain global variables which may or may not have the ALLOCATABLE attribute and may or may not be part of derived data types.

in serial application, the memory for those globals would have been allocated elsewhere and hence would be untouched by calling the subroutine and including the modules, right? in contrast, local variables would behave as jim describes it.
in parallel application, the only convention that makes sense would be that all globals included that way are silently assumed to be shared? presumably, there is not even a way to make any of them PRIVATE even if i use the same module also in the calling subroutine, correct?

Also, in my case -openmp did not seem to make declared arrays stack local implicitly. Adding -recursive did, however (again, this is only for ifort 11.1.046). Is this also related to whether the size of the local array in the subroutine is known at compiletime or only at runtime?

Thanks for any further hints,
andreas

gilthe · ‎09-18-2009

Quoting - gilthe

Also, in my case -openmp did not seem to make declared arrays stack local implicitly. Adding -recursive did, however (again, this is only for ifort 11.1.046). Is this also related to whether the size of the local array in the subroutine is known at compiletime or only at runtime?

i'll take this statement back. behavior from yesterday is not reproducible today.
sorry,
andreas

jimdempseyatthecove · ‎09-18-2009

Data Objects declared within a Module are static (equiv to SAV) to those routines that USE the module. IOW they are _used_ within the scope of the subroutine but _declared_ outside the scope of the subroutine. However, a module may also declare THREADPRIVATE data objects, and these become static variables within each thread context. The non-THREADPRIVATE module declared data can be considered shared static and the THREADPRIVATE module declared can be considered private static (meaning all subroutine levels withina thread has one copy while each thread has its own copy).

>>presumably, there is not even a way to make any of them PRIVATE even if i use the same module also in the calling subroutine, correct?

By using THREADPRIVATE data within the module you make them PRIVATE

>>Also, in my case -openmp did not seem to make declared arrays stack local implicitly. Adding -recursive did,

**** Steve, are you reading this? Is this correct behavior???

In the code I use/develop, I add the AUTOMATIC attribute. This is an explicit declaration to the compiler as to my requirements. This is non-standard Fortran. RECURSIVE is the recommended way. To me, RECURSIVE works, but it is not proper since these routines are NOT recursive (not used recursively). NOSAVE might be a better attribute. But this is a standards issue out of my control.

>>Also, in my case -openmp did not seem to make declared arrays stack local implicitly. Adding -recursive did, however (again, this is only for ifort 11.1.046). Is this also related to whether the size of the local array in the subroutine is known at compiletime or only at runtime?

No, this is related to whos code the compiler vendor did not want to break.

Consider IMPLICIT NONE, where each variable (scalar and vector) must be declared, consider SAVE as one of those declarations to explicitly state the variable (scalar and vector) is to be static. Now consider there is no way (no standards way)to explicitly state a variable (scalar and vector) is to be stack local.

Jim Dempsey