- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am seeking advice about some OpenMP code I have written. (THis is my first foray into OpenMP programming.) I would like to know if (a) It is legal code (It currently compiles and gives the expected answers) and (b) If my handling of the 4 threads is optimal.
I have a data array define as
real, allocatable :: d(:,:)
in a module; then created as:
allocate d(d(n,n)) later, and actually used as an array of length len x len
As it happens, the most time consuming part of the calculation can be run as 4 parallel threads which take identical times (to within a few ms) and use the same amount of memory. Each thread produces just 3 integers as its final result and these are stored in an array howmany(15). (The first 3 elements of howmany() hold other data.) I have written the following code to handle this situation for computers with >=4 , 3, 2 and 1 core(s).
[fortran]
! Allow program to modify the number of threads
call omp_set_dynamic(.true.)
! How many cores to use?
i = omp_get_max_threads()
NumThreads = min0(4, i)
if(len <= 50) NumThreads = 1 ! <= 50 not worth the overheads
call omp_set_num_threads(NumThreads)
! Now the parallel code
if(NumThreads == 1) then
! No multicore processor available, so 1 thread at a time.
call count_clusters_OMP(1,d,n,len,lower,upper, .true.,NumThreads,ierr)
call count_clusters_OMP(2,d,n,len,lower,upper, .true.,NumThreads,ierr)
call count_clusters_OMP(3,d,n,len,lower,upper, .true.,NumThreads,ierr)
call count_clusters_OMP(4,d,n,len,lower,upper, .true.,NumThreads,ierr)
! 2 cores
else if(NumThreads == 2 .or. NumThreads == 3) then
! 1st 2 calculations
!$OMP PARALLEL SECTIONS COPYIN(d, howmany, maxval, minval, n, len)
!$OMP SECTION
call count_clusters_OMP(1,d,n,len,lower,upper,.true.,NumThreads,ierr)
!$OMP SECTION
call count_clusters_OMP(2,d,n,len,lower,upper,.false.,NumThreads,ierr)
!$OMP END PARALLEL SECTIONS
! Second set
!$OMP PARALLEL SECTIONS COPYIN(d, howmany, maxval, minval, n, len)
!$OMP SECTION
call count_clusters_OMP(3,d,n,len,lower,upper,.true.,NumThreads,ierr)
!$OMP SECTION
call count_clusters_OMP(4,d,n,len,lower,upper,.false.,NumThreads,ierr)
!$OMP END PARALLEL SECTIONS
! 4 cores (or more)
else if(NumThreads == 4) then
!$OMP PARALLEL SECTIONS COPYIN(d, howmany, maxval, minval, n, len)
!$OMP SECTION
call count_clusters_OMP(1,d,n,len,lower,upper,.true.,NumThreads,ierr)
!$OMP SECTION
call count_clusters_OMP(2,d,n,len,lower,upper,.false.,NumThreads,ierr)
!$OMP SECTION
call count_clusters_OMP(3,d,n,len,lower,upper,.false.,NumThreads,ierr)
!$OMP SECTION
call count_clusters_OMP(4,d,n,len,lower,upper,.false.,NumThreads,ierr)
!$OMP END PARALLEL SECTIONS
endif
[/fortran]
(lower, upper, minval and maxval are scalars.)
A colleague of mine has criticised this code on 2 grounds:
- It is not legal Fortran.
- I should not manage the threads as I do but let the software do this.
Can I solicit an expert opinion please?
With thanks
Chris G
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If d is a read only (input) array .OR. if the first argument of count_clusters_OMP partitions d into independent sections then d need not be copied in (and need not be private). Please indicate what variables get modified. Are these any of the arguments and/or are any global variables?
You can vary the thread count with static (non-dynamic) scheduling as well. Dynamic alters how partitions are generated and is typically used when equal partitions do not yield equal work.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Jim.
d is modified by each thread. Nothing else gets modified e.g. the variables maxval, minval, n, len are not changed.
The array howmany(15) gets elements changed by each thread. If j is the thread number (j = 1, 4) then
howmany((j-1)*3 +4), howmany((j-1)*3 +5), howmany((j-1)*3 +6) are writtten by thread j.
The equal partitions I have created do yield almost exactly equal work.
ChrisG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does each (any) thread use for input a howmany element that resides in the write domain of a different thread
X(j) = fn(X(j-1)) or fn(X(j+1)) where the read reference is within the write reference of a different thread?
If NOT then d need not be copied, if SO then a copy of d may or may not resolve the issue as there may be temporal issues with respect to order of execution. A closer examination of the code would be required to construct a working parallel solution.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is my use of copyin legal? The Intel documentation says:
"Parallel Directive Clause: Specifies that the data in the master thread of the team is to be copied to the thread private copies of the common block at the beginning of the parallel region.
COPYIN (list)
|
list |
Is the name of one or more variables or common blocks that are accessible to the scoping unit. Subobjects cannot be specified. Each name must be separated by a comma, and a named common block must appear between slashes (/ /). |
The COPYIN clause applies only to common blocks declared as THREADPRIVATE.
You do not need to specify the whole THREADPRIVATE common block, you can specify named variables within the common block"
I have not used common blocks or threadprivate, yet the code works!
Would it be better to make my own copies of the d array (say d2(), d3()... ) and use these explicitly in the calls to count_clusters_OMP() as needed?
ChrisG
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You stated that independent sections of d are modified only by one thread .AND. you also indicated that sections of d do not have time based dependencies with respect to other sections of d. Therefore d can be, and should be modified directly by each thread.
You can get in trouble by copying data when it should not be copied. Pseudo code
Thread 0 Thread 1
copy all of d, copy all of d
modify half of d, modify other half of d
restore copy of d to d, restore copy of d to d
In the above case, you might only see the last thread's update (of copy of d).
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page