- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The purpose of this example is to find out how a program behaves when both the upper layer (Python) and the lower layer (Fortran) use their threading capabilities.
The Python3 caller script launches 2 concurrent instances of the same Fortran routine using the Thread class.
The Fortran callee is OpenMP parallel with OMP_NUM_THREADS set to 4.
Looking at the printout it looks like the 2 concurrent OpenMP instances don't interfere with each other, both have their threads numbered from 0 to 3. This is nice as it allows to put a level of parallelism on top of OpenMP. But the question is: Is such nested parallelism a supported feature or is this just an accidental observation ?
-------------------- Output printed --------------------
>python3 pydot.py Thread 2 Thread 3 Thread 0 Thread 1 Thread 0 Thread 1 1000.0 Thread 2 Thread 3 500.0
-------------------- Fortran code ----------------------
! Compile to shared library doing ! ifort -g -fPIC -shared -openmp -o ddot.so ddot.f90 real(8) function ddot(n,a,b) bind(c) use, intrinsic :: iso_c_binding use omp_lib integer(4), value, intent(in) :: n real(8), dimension(1:n), intent(in) :: a real(8), dimension(1:n), intent(in) :: b integer :: i ddot = 0.0 !$OMP PARALLEL print *,'Thread', omp_get_thread_num() !$OMP DO REDUCTION(+:ddot) do i = 1, n ddot = ddot + a(i) * b(i) end do !$OMP END PARALLEL end function
------------------ Python code ----------------------
from ctypes import * from threading import Thread # Load shared library flib = cdll.LoadLibrary('./ddot.so') # Prototype for external function veclen = 1000 flib.ddot.argtypes = [c_int, ARRAY(c_double, veclen), ARRAY(c_double, veclen)] flib.ddot.restype = c_double # Array construction rvect = c_double * veclen a = rvect() b = rvect() c = rvect() for i in range(0,veclen): a = 0.5 b = 2.0 c = 1.0 # Python wrapper def pydot(a,b): pydot.argtypes = [ARRAY(c_double, veclen), ARRAY(c_double, veclen)] res = flib.ddot(c_int(veclen), a, b) print(res) # Do it using 2 threads # Create t1 = Thread(target=pydot, args=(a,b,)) t2 = Thread(target=pydot, args=(a,c,)) # Launch t1.start() t2.start() # Wait for completion t1.join() t2.join() # End
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A major misconception of OpenMP programers is:
omp_get_thread_num()
returns an application-wide thread number. This is incorrect.
Instead, it returns the team member number. These numbers are the same when nested parallelism is NOT used.
In your example, each Python thread instantiates its own base-level OpenMP thread pool (no different than an OpenMP thread instantiating a nested thread pool). This is by design. Note, there may be a means whereby you can specify the behavior of having a single OpenMP thread pool (may be more useful for OpenMP Task constructs).
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page