- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a piece of parallelized code which I call a subroutine from. The code looks like this:
I checked several times that the number of arguments and their types are correct in both the call statement and in the subroutine declaration. The first and the third row of arguments in the call are arrays declared as allocatable.
When I tried to add a statement writing dtau and rxsurf before calling the subroutine, I could see correct values in the called subroutine !
Running the code using one CPU (enableOpenMP = .false.) works correctly, so I suspect the problem is related to the parallelization, but I have no idea what I could try.
The application is compiled using
/nologo /debug:full /Od /D_PARALLELIZATION_ /gen-interfaces /fixed /extend_source:132 /Qopenmp /fpscomp:general /debug-parameters:used /warn:declarations /warn:truncated_source /warn:interfaces /assume:byterecl /module:"Debug\" /object:"Debug\" /traceback /check:pointer /check:bounds /check:uninit /check:format /check:arg_temp_created /libs:static /threads /dbglibs /c /align:all /heap-arrays
and linked using
/OUT:"..." /INCREMENTAL:NO /NOLOGO /DELAYLOAD:"EventLog.dll" /MANIFEST /MANIFESTFILE:"..." /DEBUG /PDB:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"..." delayimp.lib libguide.lib EventLog.lib
[cpp]!dec$ if defined (_PARALLELIZATION_)I made sure that dtau and rxsurf are initialized to 1e10 and 1 before the parallel block; however, checking their values inside the subroutine energy_SIP_coef shows that both of them are 0.
!$omp parallel if ( enableOpenMP .AND. omp_energy ) num_threads ( threads ) default ( shared )
!$omp& firstprivate ( use_h, dtau, rxsurf )
!$omp& private ( i, j, k, l, im1, ip1, jm1, jp1, km1, kp1 )
!$omp do schedule(dynamic,3)
!dec$ end if
do i=2,nx-1
im1 = i-1
ip1 = i+1
do k=2,nz-1
km1 = k-1
kp1 = k+1
do j=2,ny-1
jm1 = j-1
jp1 = j+1
l = lk(k) + li(i) + j
if ( type(i,j,k).le.-6 ) then
if ( use_h ) then
call energy_SIP_coef (
+ u, v, w, tbx, tby, tbz, h, h0, t, cp, lam, Source, c, spm, type,
+ dtau, rxsurf, i, j, k, l, im1, jm1, km1, ip1, jp1, kp1,
+ AB, AW, AS, AP, AN, AE, AT, Q )
else
call ...
endif
endif
end do
end do
end do
!dec$ if defined (_PARALLELIZATION_)
!$omp end do
!$omp end parallel
!dec$ end if[/cpp]
I checked several times that the number of arguments and their types are correct in both the call statement and in the subroutine declaration. The first and the third row of arguments in the call are arrays declared as allocatable.
When I tried to add a statement writing dtau and rxsurf before calling the subroutine, I could see correct values in the called subroutine !
Running the code using one CPU (enableOpenMP = .false.) works correctly, so I suspect the problem is related to the parallelization, but I have no idea what I could try.
The application is compiled using
/nologo /debug:full /Od /D_PARALLELIZATION_ /gen-interfaces /fixed /extend_source:132 /Qopenmp /fpscomp:general /debug-parameters:used /warn:declarations /warn:truncated_source /warn:interfaces /assume:byterecl /module:"Debug\" /object:"Debug\" /traceback /check:pointer /check:bounds /check:uninit /check:format /check:arg_temp_created /libs:static /threads /dbglibs /c /align:all /heap-arrays
and linked using
/OUT:"..." /INCREMENTAL:NO /NOLOGO /DELAYLOAD:"EventLog.dll" /MANIFEST /MANIFESTFILE:"..." /DEBUG /PDB:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"..." delayimp.lib libguide.lib EventLog.lib
Link Copied
16 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have just realized that the above described problem is basically the same as the one discussed in this thread. A workaround suggested by jimdempseyatthecove, see here, helped to resolve my problem at that time. I used the same workaround now and it helped again.
I wonder whether I have the same bug in two completely independent parts of the complete code, or whether there could be something what the compiler does not do correctly.
I wonder whether I have the same bug in two completely independent parts of the complete code, or whether there could be something what the compiler does not do correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jirina
I have just realized that the above described problem is basically the same as the one discussed in this thread. A workaround suggested by jimdempseyatthecove, see here, helped to resolve my problem at that time. I used the same workaround now and it helped again.
I wonder whether I have the same bug in two completely independent parts of the complete code, or whether there could be something what the compiler does not do correctly.
I wonder whether I have the same bug in two completely independent parts of the complete code, or whether there could be something what the compiler does not do correctly.
Try using
!$omp& private ( use_h, dtau, rxsurf )
!$omp& copyin ( use_h, dtau, rxsurf )
inplace of firstprivate
What may be happening is you may have nested parallel regions. FIRSTPRIVATE copies from the global contex (the value in the context _prior_ to entering parallel region(s). Wheras COPYIN copies from the current thread which creates the next nest level (then becomes that levels master thread). FIRSTPRIVATE and COPYIN are equivilent ONLY in the situation where the current master thread is the thread spawing the next nest level
main thread,
thread 0 of 1st level,
thread 0 of 2nd level created by thread 0 of 1st level,
thread 0 of 3rd level created by thread 0 of 2nd level created by thread 0 of 1st level,
...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried your suggestion, but the problem is still occuring. I am sure to have no nested parallel regions in this case, so the cause of the problem might be somewhere else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you verified that this is not a case of the debugger displaying the incorrect variable? If you write(*,*) yourVariableHere inside the parallel region does it print out OK?
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I tried using write inside the parallel region and the value was incorrect. As I mentioned in my original post, the value becomes correct when I use the variable (e.g. printing it out) before calling the subroutine.
Actually, I prefer using write when debugging a parallel code, because I am having difficulties with the debugger. It sometimes does not stop at breakpoints placed inside a parallel region. Should I expect any limitations when debugging a parallel code? Could it be there are some code "optimizations"? Anyway, this might be discussed in a different thread.
Update: I have just found out that I might have been doing something wrong. I did not include use_h, dtau, rxsurf in FIRSTPRIVATE, I did not use the workaround, however, their values are correct inside the subroutine called from the parallel region. So, could it be that there is no need to include in FIRSTPRIVATE those variables whose values are not changed in the parallel region?
Actually, I prefer using write when debugging a parallel code, because I am having difficulties with the debugger. It sometimes does not stop at breakpoints placed inside a parallel region. Should I expect any limitations when debugging a parallel code? Could it be there are some code "optimizations"? Anyway, this might be discussed in a different thread.
Update: I have just found out that I might have been doing something wrong. I did not include use_h, dtau, rxsurf in FIRSTPRIVATE, I did not use the workaround, however, their values are correct inside the subroutine called from the parallel region. So, could it be that there is no need to include in FIRSTPRIVATE those variables whose values are not changed in the parallel region?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FIRSTPRIVATE (from my understanding of the documentation) is
(variable list) variables are PRIVATE and COPYIN data comes from MAIN level scope.
PRIVATE with COPYIN
(of varibles in both clauses) variables are PRIVATE and COPYIN data comes fromthe thread scopethat instantiates the nextlevel. Depending ongeneoligy of the thread which instantiates the parallel region this may or may not be the same as the MAIN level scope.
Should the code that creates the parallel region use PRIVATE only, and then calls the subroutine and the variables are not modified, Then for all thread team member numbers excepting for 0, the value of the variable (array?) is undefined. For thread team member number 0, the context is that of the thread that instantiated the parallel region(i.e. data is as what it was at time of creation of parallel region (which may have been defined or undefined)).
It sounds like you may need SHARED for these variables.
NOTE: The specification (from my understanding of the specification) could also be interpreted as COPYIN data comes fromthe thread scopethat instantiates the nextlevel WITH THE PROVISION of the copy operation is performed AS IF in a SINGLE section insertedat thefront of the parallel region.
The IVF documentation needs to be improved in this area, especially with respect to nested levels. This improvement should contain diagrams of the data placement and associations with respect to the copy operation.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I checked the specification too, but I am far from being so experienced as you, so your last explanation starts to be quite complicated for me. :-[
Anyway, to keep it simple, let me emphasise that I do not have any nested parallel regions in this case. If you have a look at the declaration of the parallel region, it reads
If I remove the line with FIRSTPRIVATE from PARALLEL, everything is working correctly. Could a combination of DEFAULT(SHARED) and FIRSTPRIVATE cause any problems? I have already mentioned that variables used with FIRSTPRIVATE are not changed, just used for calculation of other variables, so I wonder whether it makes sense to include such variables in FIRSTPRIVATE.
Anyway, to keep it simple, let me emphasise that I do not have any nested parallel regions in this case. If you have a look at the declaration of the parallel region, it reads
[cpp]!$omp parallel if ( enableOpenMP .AND. omp_energy ) num_threads ( threads ) default ( shared ) !$omp& firstprivate ( use_h, dtau, rxsurf ) !$omp& private ( i, j, k, l, im1, ip1, jm1, jp1, km1, kp1 ) [/cpp]which means that all variables should be SHARED (as stated in the specification).
If I remove the line with FIRSTPRIVATE from PARALLEL, everything is working correctly. Could a combination of DEFAULT(SHARED) and FIRSTPRIVATE cause any problems? I have already mentioned that variables used with FIRSTPRIVATE are not changed, just used for calculation of other variables, so I wonder whether it makes sense to include such variables in FIRSTPRIVATE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jirina
[cpp]!$omp parallel if ( enableOpenMP .AND. omp_energy ) num_threads ( threads ) default ( shared )which means that all variables should be SHARED (as stated in the specification).
!$omp& firstprivate ( use_h, dtau, rxsurf )
!$omp& private ( i, j, k, l, im1, ip1, jm1, jp1, km1, kp1 ) [/cpp]
If I remove the line with FIRSTPRIVATE from PARALLEL, everything is working correctly. Could a combination of DEFAULT(SHARED) and FIRSTPRIVATE cause any problems? I have already mentioned that variables used with FIRSTPRIVATE are not changed, just used for calculation of other variables, so I wonder whether it makes sense to include such variables in FIRSTPRIVATE.
You haven't shown anything here to indicate why your firstprivate would affect correctness.
For code with any degree of complication, default(none) is important to help catch mistakes. Unfortunately, Intel Thread Checker seems to dislike default(none).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I checked other locations in my code and realized that I am always using DEFAULT(SHARED) (even though SHARED is default when DEFAULT is not specified) and I am sometimes using FIRSTPRIVATE for variables which are not modified inside the corresponding parallel region. I have come across this problem only twice, but I was not able to find any cause in neither of those cases.
I will consider using default(none) to see if it helps me to find the real cause of the problem.
Thank you and Jim for your suggestions and ideas.
I will consider using default(none) to see if it helps me to find the real cause of the problem.
Thank you and Jim for your suggestions and ideas.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jirina,
The code sample you submitted was a subroutine containing the !$omp parallel... There is no way to determine if the subroutine were called from within an outer OpenMP nested layer thread.
!$omp parallel...
...
call yourSub
...
!omp end parallel...
subroutine yourSub(...
...
!$omp parallel...
(in nested layer here)
I could make no assumptions as to the circumstances of the call. You can determine thisby inserting a test
subroutine yourSub(...
...
! _prior_ to parallel region insert
IF(OMP_IN_PARALLEL()) THEN
WRITE(*,*) "FIRTSTPRIVATE MAY HAVE PROBLEMS"
ENDIF
!$omp parallel...
(in nested layer here)
(you will require USE OMP_LIB in your subroutine.)
Should that IF clause trigger then the FIRSTPRIVATE is acting as it should and in the process producing the effect that you do not want. That is, the subroutine was called from an OpenMP thread team member who's geneology was not alwaysthread team member number 0, and therefore the callers context was not that of the MAIN level thread. And therefor the MAIN level instance of the variable(s) were inconsistant with the callers instance of the variables(s). COPYIN would copy from the callers instance (which is not necessarily MAIN instance).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(the test does not assert)
"the geneology was not always thread team member 0"
rather it asserts
"the geneologymay not always have been thread team member 0"
And in the case when it was not, then FIRTSTPRIVATE may not be acting as you expect.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried your suggestion with OMP_IN_PARALLEL and the test confirmed that the parallel region I have problems with is not run from within another parallel region. The code before the parallel region is executed by the main level thread.
Anyway, I encountered another interesting problem with FIRSTPRIVATE. I tried to compile my program in Linux using the version 11.0.083 and I got a catastrophic error: Internal compiler error. I am going to report it in the corresponding forum, but I am mentioning it here, because it helped to remove the line with FIRSTPRIVATE from the following code:
This starts to look suspicious, because it is the third time the problem is related to FIRSTPRIVATE. I will try to create an example which could be submitted to Intel Support; I hope it is not that I am making a mistake.
Anyway, I encountered another interesting problem with FIRSTPRIVATE. I tried to compile my program in Linux using the version 11.0.083 and I got a catastrophic error: Internal compiler error. I am going to report it in the corresponding forum, but I am mentioning it here, because it helped to remove the line with FIRSTPRIVATE from the following code:
[cpp]!dec$ if defined (_PARALLELIZATION_)
!$omp parallel if ( enableOpenMP .AND. omp_solvers ) num_threads ( threads ) default ( shared )
!$omp& private ( i, j, k, l, P1, P2, P3 )
!$omp& firstprivate ( alpha_SIP )
!$omp do schedule(dynamic,3)
!dec$ end if[/cpp]
Once again, alpha_SIP is real*8, initialized before the parallel region, and its value is not changed inside the region.This starts to look suspicious, because it is the third time the problem is related to FIRSTPRIVATE. I will try to create an example which could be submitted to Intel Support; I hope it is not that I am making a mistake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you can produce a small example, if it is small enough can you post it here. The forum members might be able to provide you with a work around while you are waiting for a fixed version (assuming there is something to fix).
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The main problem is that any attempt to simplify the code so that I would be allowed to submit it resulted in a code which is working well and the problem does not occur. So I am not sure what to do when I am not allowed to submit the original code. I will keep trying, but it might take some time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok then,
If alpha_SIP is not changed inside the parallel region (and will not be later on) then remove firstprivate(alpha_SIP) and let the default(shared) provide access to alpha_SIP from within the parallel region.
If alpha_SIP is not changed inside the parallel region (but will/maybe later on) then change firstprivate(alpha_SIP) to private(alpha_SIP) and add COPYIN(alpha_SIP) assuming that you want the copy of alpha_SIP from the context of the thread creating the parallel region.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Removing alpha_SIP from FIRSTPRIVATE helped. I will try appling your solution with combining PRIVATE and COPYIN when a variable is going to be changed inside the parallel region.
Thank you for your ideas and help.
Thank you for your ideas and help.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page