Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Strange OpenMP (?) problem

jirina
New Contributor I
1,462 Views
I have a piece of parallelized code which I call a subroutine from. The code looks like this:
[cpp]!dec$ if defined (_PARALLELIZATION_)
!$omp parallel if ( enableOpenMP .AND. omp_energy ) num_threads ( threads ) default ( shared )
!$omp& firstprivate ( use_h, dtau, rxsurf )
!$omp& private ( i, j, k, l, im1, ip1, jm1, jp1, km1, kp1 )

!$omp do schedule(dynamic,3)
!dec$ end if
do i=2,nx-1
im1 = i-1
ip1 = i+1

do k=2,nz-1
km1 = k-1
kp1 = k+1
do j=2,ny-1
jm1 = j-1
jp1 = j+1

l = lk(k) + li(i) + j
if ( type(i,j,k).le.-6 ) then
if ( use_h ) then
call energy_SIP_coef (
+ u, v, w, tbx, tby, tbz, h, h0, t, cp, lam, Source, c, spm, type,
+ dtau, rxsurf, i, j, k, l, im1, jm1, km1, ip1, jp1, kp1,
+ AB, AW, AS, AP, AN, AE, AT, Q )
else
call ...
endif
endif

end do
end do

end do
!dec$ if defined (_PARALLELIZATION_)
!$omp end do
!$omp end parallel
!dec$ end if[/cpp]
I made sure that dtau and rxsurf are initialized to 1e10 and 1 before the parallel block; however, checking their values inside the subroutine energy_SIP_coef shows that both of them are 0.

I checked several times that the number of arguments and their types are correct in both the call statement and in the subroutine declaration. The first and the third row of arguments in the call are arrays declared as allocatable.

When I tried to add a statement writing dtau and rxsurf before calling the subroutine, I could see correct values in the called subroutine !

Running the code using one CPU (enableOpenMP = .false.) works correctly, so I suspect the problem is related to the parallelization, but I have no idea what I could try.

The application is compiled using

/nologo /debug:full /Od /D_PARALLELIZATION_ /gen-interfaces /fixed /extend_source:132 /Qopenmp /fpscomp:general /debug-parameters:used /warn:declarations /warn:truncated_source /warn:interfaces /assume:byterecl /module:"Debug\" /object:"Debug\" /traceback /check:pointer /check:bounds /check:uninit /check:format /check:arg_temp_created /libs:static /threads /dbglibs /c /align:all /heap-arrays

and linked using

/OUT:"..." /INCREMENTAL:NO /NOLOGO /DELAYLOAD:"EventLog.dll" /MANIFEST /MANIFESTFILE:"..." /DEBUG /PDB:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"..." delayimp.lib libguide.lib EventLog.lib
0 Kudos
16 Replies
jirina
New Contributor I
1,462 Views
I have just realized that the above described problem is basically the same as the one discussed in this thread. A workaround suggested by jimdempseyatthecove, see here, helped to resolve my problem at that time. I used the same workaround now and it helped again.

I wonder whether I have the same bug in two completely independent parts of the complete code, or whether there could be something what the compiler does not do correctly.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views
Quoting - jirina
I have just realized that the above described problem is basically the same as the one discussed in this thread. A workaround suggested by jimdempseyatthecove, see here, helped to resolve my problem at that time. I used the same workaround now and it helped again.

I wonder whether I have the same bug in two completely independent parts of the complete code, or whether there could be something what the compiler does not do correctly.

Try using
!$omp& private ( use_h, dtau, rxsurf )
!$omp& copyin ( use_h, dtau, rxsurf )

inplace of firstprivate

What may be happening is you may have nested parallel regions. FIRSTPRIVATE copies from the global contex (the value in the context _prior_ to entering parallel region(s). Wheras COPYIN copies from the current thread which creates the next nest level (then becomes that levels master thread). FIRSTPRIVATE and COPYIN are equivilent ONLY in the situation where the current master thread is the thread spawing the next nest level

main thread,
thread 0 of 1st level,
thread 0 of 2nd level created by thread 0 of 1st level,
thread 0 of 3rd level created by thread 0 of 2nd level created by thread 0 of 1st level,
...

Jim Dempsey

0 Kudos
jirina
New Contributor I
1,462 Views
I tried your suggestion, but the problem is still occuring. I am sure to have no nested parallel regions in this case, so the cause of the problem might be somewhere else.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views

Have you verified that this is not a case of the debugger displaying the incorrect variable? If you write(*,*) yourVariableHere inside the parallel region does it print out OK?

Jim
0 Kudos
jirina
New Contributor I
1,462 Views
Yes, I tried using write inside the parallel region and the value was incorrect. As I mentioned in my original post, the value becomes correct when I use the variable (e.g. printing it out) before calling the subroutine.

Actually, I prefer using write when debugging a parallel code, because I am having difficulties with the debugger. It sometimes does not stop at breakpoints placed inside a parallel region. Should I expect any limitations when debugging a parallel code? Could it be there are some code "optimizations"? Anyway, this might be discussed in a different thread.

Update: I have just found out that I might have been doing something wrong. I did not include use_h, dtau, rxsurf in FIRSTPRIVATE, I did not use the workaround, however, their values are correct inside the subroutine called from the parallel region. So, could it be that there is no need to include in FIRSTPRIVATE those variables whose values are not changed in the parallel region?
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views

FIRSTPRIVATE (from my understanding of the documentation) is

(variable list) variables are PRIVATE and COPYIN data comes from MAIN level scope.

PRIVATE with COPYIN

(of varibles in both clauses) variables are PRIVATE and COPYIN data comes fromthe thread scopethat instantiates the nextlevel. Depending ongeneoligy of the thread which instantiates the parallel region this may or may not be the same as the MAIN level scope.

Should the code that creates the parallel region use PRIVATE only, and then calls the subroutine and the variables are not modified, Then for all thread team member numbers excepting for 0, the value of the variable (array?) is undefined. For thread team member number 0, the context is that of the thread that instantiated the parallel region(i.e. data is as what it was at time of creation of parallel region (which may have been defined or undefined)).

It sounds like you may need SHARED for these variables.

NOTE: The specification (from my understanding of the specification) could also be interpreted as COPYIN data comes fromthe thread scopethat instantiates the nextlevel WITH THE PROVISION of the copy operation is performed AS IF in a SINGLE section insertedat thefront of the parallel region.

The IVF documentation needs to be improved in this area, especially with respect to nested levels. This improvement should contain diagrams of the data placement and associations with respect to the copy operation.


Jim Dempsey
0 Kudos
jirina
New Contributor I
1,462 Views
I checked the specification too, but I am far from being so experienced as you, so your last explanation starts to be quite complicated for me. :-[

Anyway, to keep it simple, let me emphasise that I do not have any nested parallel regions in this case. If you have a look at the declaration of the parallel region, it reads
[cpp]!$omp parallel if ( enableOpenMP .AND. omp_energy ) num_threads ( threads ) default ( shared ) 
!$omp&  firstprivate ( use_h, dtau, rxsurf ) 
!$omp&  private ( i, j, k, l, im1, ip1, jm1, jp1, km1, kp1 ) [/cpp]
which means that all variables should be SHARED (as stated in the specification).

If I remove the line with FIRSTPRIVATE from PARALLEL, everything is working correctly. Could a combination of DEFAULT(SHARED) and FIRSTPRIVATE cause any problems? I have already mentioned that variables used with FIRSTPRIVATE are not changed, just used for calculation of other variables, so I wonder whether it makes sense to include such variables in FIRSTPRIVATE.
0 Kudos
TimP
Honored Contributor III
1,462 Views
Quoting - jirina

[cpp]!$omp parallel if ( enableOpenMP .AND. omp_energy ) num_threads ( threads ) default ( shared ) 
!$omp& firstprivate ( use_h, dtau, rxsurf )
!$omp& private ( i, j, k, l, im1, ip1, jm1, jp1, km1, kp1 ) [/cpp]
which means that all variables should be SHARED (as stated in the specification).

If I remove the line with FIRSTPRIVATE from PARALLEL, everything is working correctly. Could a combination of DEFAULT(SHARED) and FIRSTPRIVATE cause any problems? I have already mentioned that variables used with FIRSTPRIVATE are not changed, just used for calculation of other variables, so I wonder whether it makes sense to include such variables in FIRSTPRIVATE.
default(shared) is in fact the default (when you don't specify this), except for the DO index. If you don't modify those variables you marked firstprivate anywhere in the parallel region, shared should work without firstprivate. For myself, I like to avoid firstprivate/lastprivate when possible, particularly when the avoidance doesn't complicate the source code, but I think the jury is still out.
You haven't shown anything here to indicate why your firstprivate would affect correctness.
For code with any degree of complication, default(none) is important to help catch mistakes. Unfortunately, Intel Thread Checker seems to dislike default(none).
0 Kudos
jirina
New Contributor I
1,462 Views
I checked other locations in my code and realized that I am always using DEFAULT(SHARED) (even though SHARED is default when DEFAULT is not specified) and I am sometimes using FIRSTPRIVATE for variables which are not modified inside the corresponding parallel region. I have come across this problem only twice, but I was not able to find any cause in neither of those cases.

I will consider using default(none) to see if it helps me to find the real cause of the problem.

Thank you and Jim for your suggestions and ideas.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views

Jirina,

The code sample you submitted was a subroutine containing the !$omp parallel... There is no way to determine if the subroutine were called from within an outer OpenMP nested layer thread.

!$omp parallel...
...
call yourSub
...
!omp end parallel...

subroutine yourSub(...
...
!$omp parallel...
(in nested layer here)

I could make no assumptions as to the circumstances of the call. You can determine thisby inserting a test

subroutine yourSub(...
...
! _prior_ to parallel region insert
IF(OMP_IN_PARALLEL()) THEN
WRITE(*,*) "FIRTSTPRIVATE MAY HAVE PROBLEMS"
ENDIF
!$omp parallel...
(in nested layer here)


(you will require USE OMP_LIB in your subroutine.)

Should that IF clause trigger then the FIRSTPRIVATE is acting as it should and in the process producing the effect that you do not want. That is, the subroutine was called from an OpenMP thread team member who's geneology was not alwaysthread team member number 0, and therefore the callers context was not that of the MAIN level thread. And therefor the MAIN level instance of the variable(s) were inconsistant with the callers instance of the variables(s). COPYIN would copy from the callers instance (which is not necessarily MAIN instance).

Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views

(the test does not assert)

"the geneology was not always thread team member 0"

rather it asserts

"the geneologymay not always have been thread team member 0"

And in the case when it was not, then FIRTSTPRIVATE may not be acting as you expect.

Jim


0 Kudos
jirina
New Contributor I
1,462 Views
I tried your suggestion with OMP_IN_PARALLEL and the test confirmed that the parallel region I have problems with is not run from within another parallel region. The code before the parallel region is executed by the main level thread.

Anyway, I encountered another interesting problem with FIRSTPRIVATE. I tried to compile my program in Linux using the version 11.0.083 and I got a catastrophic error: Internal compiler error. I am going to report it in the corresponding forum, but I am mentioning it here, because it helped to remove the line with FIRSTPRIVATE from the following code:
[cpp]!dec$ if defined (_PARALLELIZATION_)
!$omp parallel if ( enableOpenMP .AND. omp_solvers ) num_threads ( threads ) default ( shared )
!$omp&  private ( i, j, k, l, P1, P2, P3 )
!$omp&  firstprivate ( alpha_SIP )
         
!$omp do schedule(dynamic,3)
!dec$ end if[/cpp]
Once again, alpha_SIP is real*8, initialized before the parallel region, and its value is not changed inside the region.

This starts to look suspicious, because it is the third time the problem is related to FIRSTPRIVATE. I will try to create an example which could be submitted to Intel Support; I hope it is not that I am making a mistake.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views

When you can produce a small example, if it is small enough can you post it here. The forum members might be able to provide you with a work around while you are waiting for a fixed version (assuming there is something to fix).

Jim
0 Kudos
jirina
New Contributor I
1,462 Views
The main problem is that any attempt to simplify the code so that I would be allowed to submit it resulted in a code which is working well and the problem does not occur. So I am not sure what to do when I am not allowed to submit the original code. I will keep trying, but it might take some time.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,462 Views

Ok then,

If alpha_SIP is not changed inside the parallel region (and will not be later on) then remove firstprivate(alpha_SIP) and let the default(shared) provide access to alpha_SIP from within the parallel region.

If alpha_SIP is not changed inside the parallel region (but will/maybe later on) then change firstprivate(alpha_SIP) to private(alpha_SIP) and add COPYIN(alpha_SIP) assuming that you want the copy of alpha_SIP from the context of the thread creating the parallel region.

Jim Dempsey

0 Kudos
jirina
New Contributor I
1,462 Views
Removing alpha_SIP from FIRSTPRIVATE helped. I will try appling your solution with combining PRIVATE and COPYIN when a variable is going to be changed inside the parallel region.

Thank you for your ideas and help.
0 Kudos
Reply