- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been experimenting with OpenMp- trying to pass a module variable into a called subroutine within the dynamic extent of a parallel block.
So, I made a short program to experiment.
If I understand correctly, when a subroutine is called from within a parallel
block, each thread gets its owncopy of the subroutine, and corresponding copies of any "private" variablesare created -one foreach thread.
The program and output given below does just that and demonstrates that my understanding is potentially correct. I realize that the value of "jj" inside "subtest" isn't always going to be the same as the threadnumberobtained from inside that subroutine (depends on timing), but as long as "jj.ne.0", the point has been made. The compilation directive is:
ifort /Qopenmp omptest.f90
But, when I incorporate the same thing in my real program, the samestructuredoesn't work.
The only difference is that the "real program"is split into many files, corresponding toone file for each subroutine and module, rather than one single file for the entire program, as in theexample program.
If you split the demonstration program given below into 3 files (main program, subtest.f90 and jvar.f90), it no longer works - the variable "jj" always equals zero - the value associatied with the master thread- rather than a unique value for each thread. See second output below.
The compilation directive (3 files in the directory) is:
ifort /Qopenmp *.f90
So, is there a compiler/link problem here, or am I missing something?
Yert
PS. I also tried to use "threadprivate" rather than "private". It also does the same thing. BUT - the documentation for "threadprivate" in the Intel Fortran "help" file says that "threadprivate" is only supposed to be used for "named common blocks inside a module". So why does the example below not give a compilation error when "threadprivate" is used with thevariable name "jj"?
The openMp documentation doesn't limit "threadprivate" to common blocks only.
EXAMPLE PROGRAM
module jvar
integer :: jj
end module jvar
program omptest
use jvar, only :jj
use omp_lib
integer :: j
!!$OMP threadprivate (jj)
!$omp parallel private(jj)
!$omp critical (test)
jj = OMP_Get_Thread_num()
write (10,*) ' calling with jj= ', jj
!$omp end critical (test)
call subtest()
!$OMP end parallel
stop
end
subroutine subtest()
use jvar, only : jj
use omp_lib
write (10,*) ' inside with jj= ', jj, 'threadnumber=',OMP_get_thread_num()
return
end subroutine subtest
CORRECT OUTPUT
_____________________________
calling with jj= 1
inside with jj= 1 threadnumber= 1
calling with jj= 4
inside with jj= 4 threadnumber= 4
calling with jj= 2
inside with jj= 2 threadnumber= 2
calling with jj= 5
inside with jj= 5 threadnumber= 5
calling with jj= 0
inside with jj= 0 threadnumber= 0
calling with jj= 6
inside with jj= 6 threadnumber= 6
calling with jj= 7
inside with jj= 7 threadnumber= 7
calling with jj= 3
inside with jj= 3 threadnumber= 3
INCORRECT OUPUT - CREATED BY SPLITTING THE ABOVE PROGRAM INTO THREE SEPARATE FILES
___________________________________________________________________
calling with jj= 2
inside with jj= 0 threadnumber= 2
calling with jj= 0
inside with jj= 0 threadnumber= 0
calling with jj= 1
inside with jj= 0 threadnumber= 1
calling with jj= 5
inside with jj= 0 threadnumber= 5
calling with jj= 3
inside with jj= 0 threadnumber= 3
calling with jj= 6
inside with jj= 0 threadnumber= 6
calling with jj= 4
inside with jj= 0 threadnumber= 4
calling with jj= 7
inside with jj= 0 threadnumber= 7
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
probably see that subroutine subtest is used in a threaded context and then take care of it.
However, if you split off subtest, then the compiler can not see that. You will probably have to
declare jj to be threadprivate in the module jvar, rather than in the main program.
It is this sort of subtleties that makes using OpenMP less than trivial.
Regards,
Arjen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Better diagnosis of threading problems is promised for the next release, but it still won't happen unless you ask for it, and will continue to require data sets which exercise the necessary execution paths.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2.9.1.2 Data-sharing Attribute Rules for Variables Referenced in a Region but not in a Construct
...
Variables belonging to common blocks, or declared in modules, and referenced in
called routines in the region are shared unless they appear in a threadprivate
directive.
...
So the correct output in the post is actually incorrect and the incorrect output should be correct.
If you disable the single file IP optimization by "/Qip-" option the results of single file and multiple files will be the same. It should be an ifort bug. I will open a bug report for it and post its status here.
Why no error against the 'threadprivate' in the code is "!!$omp threadprivate(jj" was taken as comment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) pass a reference or value of j to subtest
2) make j "threadprivate" in module
Also note an additional potential error
OMP_get_thread_num() does NOT return a global thread number. Rather it returns the team member number of the current parallel region. Only when nested parallel regions is OFF, or if you never use nested parallel regions,will the assumption of OMP_get_thread_num() being an application wide thread number be true. Therefore do not ASSUME 0 means the main level thread. While this assumption may hold true in your simplified tests, it may very well hold false within your actual program.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Making "jj" threadprivate either inside the module (obviously preferred) or in each of the subroutines where it is used solves the immediate problem.... BUT
What if the variable is an allocatable array - see theexample below.
In that case, both the main program and the called subroutine "subtest" get the lower bound of the array correctly, but the main program andsubroutine in every thread get the upper bound incorrectly (see the first few lines of output below).
However, when the "allocate" statement is placed inside the "Parallel" block, the program works as expected.
Is this a compiler or programming issue ?
Comments on other replies: yes, module variables are "shared" by default, but the program need is for "private" variables from a module.The big program has about 30 or so such variables, and passing them as subroutine arguments is inefficient, although it will work, as was suggested. I tried /Qip, to no effect.
Thanks again.
Yert
________________Sample program with allocatable threadprivate array_____________
module jvar
integer :: jj
integer, allocatable, dimension(:) :: itest
!$OMP threadprivate (jj, itest)
end module jvar
program omptest
use jvar, only :jj, itest
use omp_lib
integer :: j
allocate (itest(1:5))
!$omp parallel ! private(jj)
!$omp critical (test)
jj = OMP_Get_Thread_num()
write (10,*) ' calling with jj= ', jj,' u=',ubound(itest),' l=',lbound(itest)
!$omp end critical (test)
call subtest()
!$OMP end parallel
stop
end
subroutine subtest()
use jvar, only : jj, itest
use omp_lib
write (10,*) ' inside with jj= ', jj, 'threadnumber=',OMP_get_thread_num(), &
' u=',ubound(itest),' l=',lbound(itest)
return
end subroutine subtest
__________FIRST FEW LINES OF OUTPUT______________
calling with jj= 0 u= 5 l= 1
inside with jj= 0 threadnumber= 0 u= 5 l=1
calling with jj= 5 u= 0 l= 1
inside with jj= 5 threadnumber= 5 u= 0 l=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Only the main thread allocated its private copy of itest
When each thread is to have a different itest array, as programmed in your module with threadprivate attribute, then each thread must allocate its copy of the array.
When each (all) threads are to share the sameitest, then do not make it threadprivate and allocate it once (by any thread) prior to use.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Where is such information written down?
Can you recommend a book or report?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The suggestions I made are (to me) quite obvious.
Prior to using an array, you must allocate it
When you have one instance of the array you need to allocate it at least once before use.
When you have N instances of the array, you need to allocate it at least once for each instance its being used.
I caution against looking for code samples you can cut and paste. Without a fundamentalunderstanding for what you are doing, you are in for a frustrating experience. Parallel programming is not all that hard, it just takes some common sense to become familiar with the basic principals and requirements.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the first example, a module variable "jj" is "cloned" (if I may use the word) N times, once for each thread, by the OpenMp compiler. The programmer tells the compiler to do this by declaring "jj" to have the "threadprivate" attribute.
In the second case, once the array "itest" is allocated by the program and declared "threadprivate", you'd expect that the compiler would "clone" it N times in exactly the same manner. But it doesn't. The allocation operation itself needs to be cloned "N" times, and the programmer has to know this. So, the underlying question is what is the difference between cloning an allocated array versus cloning a declared variable and why do they have to be cloned differently? Better yet, where are the rules written down so that one doesn't have to guess about such subtleties. Your interpretation makes sense, but so does mine.
I have found discrepancies between the "help" file summaries for "OpenMp" routines that come with the Intel compiler and OpenMp documentation available elsewhere (some of which applies to Fortran). For example, the description of "threadprivate" in the "help" file is inconsistent with OpenMp tutorials (see my first post in this thread). Anyway, I just ordered an OpenMp book (sight unseen), so maybe some of these questions will be answered. If the book turns out to be any good, I'll post a comment here in a few weeks.
Thanks again to all who answered.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jj is a symbolic name of a memory location known by the program. The symbolic name is "cloned" N times to N different locations (for threadprivate variables). The contents are not "cloned" excepting for when you use COPYIN or COPYFIRST on your OpenMP statements but normally you do not use COPYIN for threadprivate variables. Usualy only for local variables declared PRIVATE where effectively the local variable declared PRIVATE becomes transparently a DUMMY valueargument (on stack) in a hidden call to the parallel construct. Thus providing for each thread to have a different copy of the variable (initialized when COPYIN applied to the !$OMP statement for that variable).
>>In the second case, once the array "itest" is allocated by the program and declared "threadprivate", you'd expect that the compiler would "clone" it N times in exactly the same manner. But it doesn't
itest is an array descriptor (not the contents of the array). The array descriptor is declared at compile time as thread private (not after allocation). The array descriptor is "cloned" N times to N different locations (for threadprivate variables) and this cloning of the array descriptor occures at thread creation time (prior to allocation by main thread). The contents are not "cloned" excepting for when you use COPYIN or COPYFIRST on your OpenMP statements. It is unclear to me as to what happens if you use COPYIN on a thread private variable that is NOT one of the PRIVATE variables of the scope of the !$OMP statement.
Note, if you were to attempt COPYIN on threadprivate variable that is not one of the PRIVATE listed variables on the !$OMP statement and receive an error message. And then subsequent to this error message (in reaction toerror stating COPYIN can only be use on variables declared PRIVATE), declare the threadprivate variable as PRIVATE, what will happen is you will get an additional private copy of this thread private variable (or array).
In your example program (itest is threadprivate), each thread has the responsibility (programmer has responsibility) to allocate itest .AND. to fill it with whatever you want. (may be same stuff, ordinarily different stufff).
If your intention is for all threads to share the same itest array data THEN do not make itest threadprivate and then only allocate once andpopulate onceprior to entering the parallel region.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Supplemental question:
Since each thread mustALLOCATE an allocatable, threadprivate"array descriptor" as youcall it, is it necessary for each thread to DEALLOCATE the same arrayprior tothread termination?
Ifit is not deallocated, will a memory leak result or does the compiler automatically take care of this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note, in OpenMP the application does not have control over thread termination. Well I should rephrase this to, in OpenMP, the application cannot explicitle terminate OpenMP threads and maintain orderly execution/shutdown. An orderly shutdown has the main thread return to main level, then return from the main program. While the main thread could issue STOP at any level of parallel regions, doing so at any level other than the main level would not be orderly. Should such an unorderly STOP (or call to EXIT, etc) terminate the main thread, the other threads will have a disorderly termination. Memory will get returned, but file handles (files)may not necessarily be closed in an orderly fashion.
Rephrase of above:
Should youbegin a parallel region, and while within the parallel region a thread detects an abnormal condition (or executes an instruction producing a fault), and then should this thread terminate, then OpenMP, and thus the application in general, is crippled, and will soon die. Do not issue STOP, CALL EXIT(n), etc... by a thread in an application unless you wish to stop the application. STOP,... is not to be used as a means of an early exit of a parallel region.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page