- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a problem using OpenMP with the ifort compiler version 11.0.
Consider the following test program:
If, however, the pointer p is given the 'dimension' attribute, only the first thread that executes the parallel region works as anticipated. In the last three threads the condition is evaluated as false. Here is the modified program that produces the error:
Thanks
I have a problem using OpenMP with the ifort compiler version 11.0.
Consider the following test program:
[plain]program test use mod_test, only: check implicit none integer,pointer,save :: p=>null() !$omp threadprivate(p) !$omp parallel allocate(p) call check(p) !$omp end parallel end program test[/plain]The program uses this module:
[plain]module mod_test implicit none contains subroutine check(num) integer,pointer :: num if (associated(num)) write (*,*) 'associated' end subroutine check end module mod_test[/plain]On a 4-cpu machine this program prints the word 'associated' 4 times, as predicted.
If, however, the pointer p is given the 'dimension' attribute, only the first thread that executes the parallel region works as anticipated. In the last three threads the condition is evaluated as false. Here is the modified program that produces the error:
[plain]program test use mod_test, only: check implicit none integer,dimension(:),pointer,save :: p=>null() !$omp threadprivate(p) !$omp parallel allocate(p(1)) call check(p) !$omp end parallel end program test[/plain]The program uses this module:
[plain]module mod_test implicit none contains subroutine check(num) integer,dimension(:),pointer :: num if (associated(num)) write (*,*) 'associated' end subroutine check end module mod_test[/plain]Is this a compiler bug or what?
Thanks
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Kevin Davis (Intel)
I submitted this to the compiler team earlier; they are investigating now. Our internal tracking id is: CQ# DPD200118527. I will keep the thread updated as I learn more.
Development indicates for case 2 (pointer declaration with dimension(:) attribute) the compiler generates incorrect accesses to the thread-private variable, p. They indicate this only manifests itself when the legacy implementation of thread-private variables is used, and that specifying compatibility mode (by using option openmp-threadprivate compat) produces correct code.
They still plan to fix the issue in a future release; however, offered openmp-threadprivate compat as a work around.
Link Copied
17 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my post I didn't use the 'insert cose' feature, and so the code was not very readable. I apologize - and I fixed the original post.
I would appreciate any help with my problem,
Thanks
I would appreciate any help with my problem,
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As no one else has bitten, I'll say that I don't know how the result of this code could be predicted. Do you mean, is it a compiler bug that it doesn't give more diagnostics? OpenMP is notorious for not checking at compile time. That's one of the reasons for the Intel Thread Checker.
[cpp]|1 |I/O |Error |1 |omp |I/O operation at "ym2.f90":4 |"ym2.f9|"ym2.f9| | |data-race| | |parallel|conflicts with a prior I/O |0":4 |0":4 | | | | | |region |operation at "ym2.f90":4 | | |[/cpp]
[cpp]|1 |I/O |Error |1 |omp |I/O operation at "ym1.f90":4 |"ym1.f9|"ym1.f9| | |data-race| | |parallel|conflicts with a prior I/O |0":4 |0":4 | | | | | |region |operation at "ym1.f90":4 | | | [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been looking at this too.The OpenMP spec specifies special handling related to POINTER and ALLOCATABLE only when certain conditions do not hold, but I believe the second case should see the array pointer as associated by all threads the same as all threads see the pointer as associated in the first case. I will inquire with the developers and update the thread when I know more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try something like this:
[cpp]program test use mod_test, only: check implicit none integer,dimenstion(:), pointer,save :: p=>null() !$omp threadprivate(p) ! Above is in variable declarations of program test ! ... ! initialization portion of program test !$omp parallel nullify(p) ! shouldn't be required, used for work-around !$omp end parallel ! ... ! compute section of program test !$omp parallel allocate(p(1)) call check(p) !$omp end parallel end program test [/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18:
"...I don't know how the result of this code could be predicted..."
To me it seems clear: there are 4 copies of the pointer p, one for each thread. Each one is being allocated and then tested for association, and it should always be associated.
Quoting - Kevin Davis (Intel):
"...I will inquire with the developers and update the thread when I know more"
Thanks!
Quoting - jimdempseyatthecove:
"Try something like this:
...
!$omp parallel
nullify(p) ! shouldn't be required, used for work-around
!$omp end parallel
..."
Thank you for your suggestion, it is true that in many occasions nullifying a pointer before using it can solve problems later on. Unfortunately, in this case this does not make any difference.
"...I don't know how the result of this code could be predicted..."
To me it seems clear: there are 4 copies of the pointer p, one for each thread. Each one is being allocated and then tested for association, and it should always be associated.
Quoting - Kevin Davis (Intel):
"...I will inquire with the developers and update the thread when I know more"
Thanks!
Quoting - jimdempseyatthecove:
"Try something like this:
...
!$omp parallel
nullify(p) ! shouldn't be required, used for work-around
!$omp end parallel
..."
Thank you for your suggestion, it is true that in many occasions nullifying a pointer before using it can solve problems later on. Unfortunately, in this case this does not make any difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK - Plan B
Create a user defined type for holding thread private data.
Inside this type place the pointer to allocatable array
Create an instance of the defined type for holding thread private data as thread private
OR
Create a pointer to aninstance of the defined type for holding thread private data as thread private (and allocate in each thread).
I have a Windows based Fortran program that does the latter so I know this works
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ymost
Quoting - tim18:
"...I don't know how the result of this code could be predicted..."
To me it seems clear: there are 4 copies of the pointer p, one for each thread. Each one is being allocated and then tested for association, and it should always be associated.
"...I don't know how the result of this code could be predicted..."
To me it seems clear: there are 4 copies of the pointer p, one for each thread. Each one is being allocated and then tested for association, and it should always be associated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
I've not found any reference to tell what a threadprivate directive outside a parallel region would do . It's not clear to me that it would take effect in the next, or all following, parallel regions. Your intention may be clearer to you than to me or to the compiler. I thought maybe your idea was the compiler should tell you what (if anything) is wrong with the source code. That's why I suggested attention to the race condition diagnosed by Thread Checker.
??
When you call a subroutine scalar variables are on stack (automatic vectors are on stack or descriptor on stack and data from heap)
If this subroutine is called from a parallel region, each thread having its own stack will thus have a private copy of those stack variables while all using the same symbolic name.
That I think you understand.
Now wouldn't it be nice if each thread in a multi-threaded program could have a thread private data items that share the same name in all threads but in fact reference different data (same as stack model). You may want to place temp arrays in the thread private area or some sort of context information (e.g. pointer to some object owned by the thread).
This thread private area is independent of entering or exiting !$OMP PARALLEL regions excepting for when the !$OMP PARALLEL region creates additional thread(s). When a thread is created, it gets a copy of the current state of the master threads thread private data.
Care must be taken as a copy of the thread private data from the master thread may contain allocated arrays. It may not be polite for the 2nd thread or later thread to deallocate or disturb this array if the array was intended to be a private copy for the master thread. To help get around that consider using a pointer to the array which you can NULLIFY and/or allocate.
Thread initialization of the thread private data area can be done once early in the program.
Caution, should you use nested parallel regions care must be taken for initialization of those threads private data as well.
ThreadPrivate is a compiler directive not a runtime directive.
Symbols marked with threadprivate have a little more overhead in access. The runtime system maintains a pointer to the thread private data area. The compiler auto-magicly inserts an additional dereference via this pointer for thread private data.
Experiment with thread private data as it can really help improve performance in areas where you want largethread scratch data arrays (too large for stack).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
OK - Plan B
Create a user defined type for holding thread private data.
Inside this type place the pointer to allocatable array
Create an instance of the defined type for holding thread private data as thread private
OR
Create a pointer to aninstance of the defined type for holding thread private data as thread private (and allocate in each thread).
I have a Windows based Fortran program that does the latter so I know this works
Jim Dempsey
Great idea, and it worked! I used the first option you offered, since I didn't want to risk using a threadprivate pointer again. This way I get a de-facto threadprivate pointer without the risk, and that's great.
I actually conjured up my own workaround too: I added a threadprivate integer to hold the address of the pointer after it is allocated (I extracted the address using the 'loc' function), and referred to the correct address in every thread by using the 'pointer(a,b)' mechanism. I think I'll switch to your method, though, since it's more elegant than mine, and more importantly, my method is not portable since the loc function and pointer mechanism are specific to the Intel compiler.
Thanks a lot!
Press ENTER to look up in Wiktionary or CTRL+ENTER to look up in Wikipedia
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
I've not found any reference to tell what a threadprivate directive outside a parallel region would do . It's not clear to me that it would take effect in the next, or all following, parallel regions.
A threadprivate directive can only be used outside a parallel region. Quoting section 2.9.2 from the OpenMP specification, page 84, line 34:
[plain]The threadprivate directive must appear in the declaration section of a scoping unit in which the common block or variable is declared.[/plain]The declaration section can never be in a parallel region. The threadprivate directive changes the status of the variable for all consecutive parallel regions that have the same number of threads. Notice also that the first version of the program I attached works perfectly, it is only the addition of the 'dimension' attribute that causes the error, and this is probably a compiler bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ymost
A threadprivate directive can only be used outside a parallel region. Quoting section 2.9.2 from the OpenMP specification, page 84, line 34:
[plain]The threadprivate directive must appear in the declaration section of a scoping unit in which the common block or variable is declared.[/plain]The declaration section can never be in a parallel region. The threadprivate directive changes the status of the variable for all consecutive parallel regions that have the same number of threads. Notice also that the first version of the program I attached works perfectly, it is only the addition of the 'dimension' attribute that causes the error, and this is probably a compiler bug.
Your first version didn't work perfectly for me, as I showed.
The standard does prescribe persistence of threadprivate values between compatible parallel regions, but I don't see that it calls for threadprivate directives from outside a parallel region to apply.
If you would file a problem report on premier.intel.com, you should be able to get an answer from the Intel experts on OpenMP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I submitted this to the compiler team earlier; they are investigating now. Our internal tracking id is: CQ# DPD200118527. I will keep the thread updated as I learn more.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18,
>>However, I can't agree with you about no declarations in parallel regions.
Thread private data is static data owned within the thread context
seperate from static data owned by the process
>>How would you have any subroutine calls in parallel regions under your proposed restriction?
A subroutine may be called from within a parallel region or from sequential section.
A subroutine cannot be declared from within a parallel region.
For most thread private data use the stack
Make sure you include automatic on any arrays (vectors) as you may forget an option switch and get burned with a long debug session.
For SAVE variables, yes, you need to declare these with threadprivate attribute.
Note: A subroutine declared array such as
"real :: temp(10)" may or may not be stored on the stack.
You CANNOT tell by looking at the source code.
"real, automatic :: temp(10)" will be on the stack
or at least the array descriptor will be on the stack
(when all automatic arrays are heap arrays)
"real, allocatable :: temp(:)" the array descriptor may or may not be stored on the stack.
You CANNOT tell by looking at the source code.
"real, automatic, allocatable :: temp(:)" the array descriptor will be stored on the stack.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
...However, I can't agree with you about no declarations in parallel regions. How would you have any subroutine calls in parallel regions under your proposed restriction? How would you implement the requirement of the specification that the threadprivate appear in each compilation unit, where applicable? Maybe you didn't mean that.
Your first version didn't work perfectly for me, as I showed.
The standard does prescribe persistence of threadprivate values between compatible parallel regions, but I don't see that it calls for threadprivate directives from outside a parallel region to apply.
Your first version didn't work perfectly for me, as I showed.
The standard does prescribe persistence of threadprivate values between compatible parallel regions, but I don't see that it calls for threadprivate directives from outside a parallel region to apply.
Indeed my statement was inaccurate. I should have said a threadprivate directive cannot appear inside the lexical extent of a parallel region, i.e. it can appear in a subroutine called from a parallel region. Still, the threadprivate directive must always appear in the declaration section of any scoping unit, and it can (and often does) appear outside any parallel region at all. It is used exactly in this way in the examples given in the OpenMP specification itself. Also, a quick example can show that it cannot be used inside the lexical extent of a parallel region:
[plain]program test
implicit none
integer,save :: num
!$omp threadprivate(num)
!$omp parallel
! --- code ---
!$omp end parallel
end program test[/plain]
This code works fine, with the expected behaviour of the variable num having a separate copy for each thread. However:
[plain]program test
implicit none
integer,save :: num
!$omp parallel
!$omp threadprivate(num)
! --- code ---
!$omp end parallel
end program test[/plain]
This code produces a compilation error: "error #6236: A specification statement cannot appear in the executable section."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider the !$omp threadprivate(num) as a data declaration statement that is a band-aid for a missing Fortran keyword. i.e. Fortran should have "integer, save, threadprivate :: num" (or threadprivate could implicitly include save and "integer,threadprivate :: num" would suffice).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Kevin Davis (Intel)
I submitted this to the compiler team earlier; they are investigating now. Our internal tracking id is: CQ# DPD200118527. I will keep the thread updated as I learn more.
Development indicates for case 2 (pointer declaration with dimension(:) attribute) the compiler generates incorrect accesses to the thread-private variable, p. They indicate this only manifests itself when the legacy implementation of thread-private variables is used, and that specifying compatibility mode (by using option openmp-threadprivate compat) produces correct code.
They still plan to fix the issue in a future release; however, offered openmp-threadprivate compat as a work around.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Kevin Davis (Intel)
Development indicates for case 2 (pointer declaration with dimension(:) attribute) the compiler generates incorrect accesses to the thread-private variable, p. They indicate this only manifests itself when the legacy implementation of thread-private variables is used, and that specifying compatibility mode (by using option openmp-threadprivate compat) produces correct code.
They still plan to fix the issue in a future release; however, offered openmp-threadprivate compat as a work around.
Thank you, the '-openmp-threadprivate compat' flag solves the problem!
However, it reveals another bug which is manifested in a different section of my code. I have posted it in a new discussion.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page