Nested thread parallelism in Python and Fortran

mriedman · ‎02-10-2017

The purpose of this example is to find out how a program behaves when both the upper layer (Python) and the lower layer (Fortran) use their threading capabilities.

The Python3 caller script launches 2 concurrent instances of the same Fortran routine using the Thread class.

The Fortran callee is OpenMP parallel with OMP_NUM_THREADS set to 4.

Looking at the printout it looks like the 2 concurrent OpenMP instances don't interfere with each other, both have their threads numbered from 0 to 3. This is nice as it allows to put a level of parallelism on top of OpenMP. But the question is: Is such nested parallelism a supported feature or is this just an accidental observation ?

-------------------- Output printed --------------------

>python3 pydot.py
 Thread           2
 Thread           3
 Thread           0
 Thread           1
 Thread           0
 Thread           1
1000.0
 Thread           2
 Thread           3
500.0

-------------------- Fortran code ----------------------

! Compile to shared library doing
!    ifort -g -fPIC -shared -openmp -o ddot.so ddot.f90

real(8) function ddot(n,a,b) bind(c)

  use, intrinsic :: iso_c_binding
  use omp_lib

  integer(4),       value, intent(in) :: n
  real(8), dimension(1:n), intent(in) :: a 
  real(8), dimension(1:n), intent(in) :: b 

  integer :: i

  ddot = 0.0 

!$OMP PARALLEL
  print *,'Thread', omp_get_thread_num()

!$OMP DO REDUCTION(+:ddot) 
  do i = 1, n
    ddot = ddot + a(i) * b(i)
  end do

!$OMP END PARALLEL

end function

------------------ Python code ----------------------

from ctypes    import *
from threading import Thread

# Load shared library

flib = cdll.LoadLibrary('./ddot.so')

# Prototype for external function

veclen = 1000

flib.ddot.argtypes = [c_int, ARRAY(c_double, veclen), ARRAY(c_double, veclen)]
flib.ddot.restype  = c_double

# Array construction

rvect  = c_double * veclen

a = rvect()
b = rvect()
c = rvect()

for i in range(0,veclen):
  a = 0.5
  b = 2.0
  c = 1.0

# Python wrapper

def pydot(a,b):
  pydot.argtypes = [ARRAY(c_double, veclen), ARRAY(c_double, veclen)]
  res = flib.ddot(c_int(veclen), a, b)
  print(res)

# Do it using 2 threads

# Create
t1 = Thread(target=pydot, args=(a,b,))
t2 = Thread(target=pydot, args=(a,c,))

# Launch
t1.start()
t2.start()

# Wait for completion
t1.join()
t2.join()

# End

jimdempseyatthecove · ‎02-10-2017

A major misconception of OpenMP programers is:

omp_get_thread_num()

returns an application-wide thread number. This is incorrect.

Instead, it returns the team member number. These numbers are the same when nested parallelism is NOT used.

In your example, each Python thread instantiates its own base-level OpenMP thread pool (no different than an OpenMP thread instantiating a nested thread pool). This is by design. Note, there may be a means whereby you can specify the behavior of having a single OpenMP thread pool (may be more useful for OpenMP Task constructs).

Jim Dempsey