Solved: Re: OpenMP problem with local variables - Page 2

jirina · ‎02-11-2009

I have a code which looks like this (simplified):

[cpp]              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

*dec$ if defined (_OPENMP_)
*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
*$omp&  firstprivate ( ..., Tmin, Tmax, ... )

*$omp do schedule(dynamic,3)
*dec$ end if
            do ij = 1,(i2-i1+1)*(j2-j1+1)
            
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

                 call apply_BC ( ..., Tmin, Tmax, ... )[/cpp]

Tmin and Tmax (both real*8) contain nonzero values before the parallel region is entered. If the commented line inside the parallel region is uncommented, Tmin and Tmax are the same as before the parallel region. However, letting the line commented out and calling the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.

So, it seems entering the parallel region is causing some problems. It might be a bug in my code (quite big - 2300 lines), but I have no idea what I should focus on when trying to find the cause of the problem.

Parallel debug version and a version with enableOpenMP = .false. (no parallelization) work correctly.

jimdempseyatthecove · ‎02-18-2009

Here is a potential work around

Save the FIRTSTPRIVATE clauses as comments in the program. Place above the !$OMP PARALLEL...
Add LOGICAL :: InitOnce as subroutine local variable

In front of !$OMP PARALLEL...

Add InitOnce = .true.

Replace what used to have been the FIRSTPRIVATE clause with

FIRTSTPRIVATE(InitOnce)

Inside the body of the loop, at top of loop,add

if(InitOnce) then
InitOnce = .false.
localTmin = Tmin
localTmax = Tmax
...
endif
call foo(..., localTmin, localTmax, ...)

You can conditionalize this code if you wish.

Not clean but it should work

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎02-15-2009

jirina,

You can run the debugger on release code. Compile the release code with debugger symbol generation enabledas the only difference in your options (rember to turn this off before you ship code to users).

Debugging a release code version is particularly difficult as the optimization process moves, changes and/or eliminates code.

Fortunately, you are not using the debugger in a manner that matters with respect to this debugging difficulty. You will mearly be using the debugger at one break point to look at the dissassembly code around the break point.

[cpp]-------------- source -------------
! ASDF.F90
module inModule
real :: inModVar
end module inModule

program ASDF
use inModule

real :: inCommonVar
common /mycom/ inCommonVar



inModVar = 123.
inCommonVar =456.
onStackVar = 789.

call foo(1122.33)
end program ASDF

subroutine foo(ReferenceVar)
use inModule
real :: inCommonVar
common /mycom/ inCommonVar
real, automatic :: onStackVar
onStackVar = inModVar + inCommonVar
write(*,*) inModVar, inCommonVar, onStackVar, ReferenceVar
onStackVar = onStackVar + ReferenceVar
end subroutine foo
--------- end source --------------
---- dissassembly for write ----
---- Note, this is 32-bit code, 64-bit code will look different ----
---- My Annotations are  lines starting with >

> source write statement
write(*,*) inModVar, inCommonVar, onStackVar, ReferenceVar
0040101D  mov         dword ptr [ebp-34h],0 
> inModVar is "_INMODULE_mp_INMODVAR"
00401024  fld         dword ptr [_INMODULE_mp_INMODVAR (4DEC60h)] 
0040102A  fstp        dword ptr [ebp-10h] 
0040102D  lea         eax,[ebp-34h] 
00401030  mov         dword ptr [esp],eax 
00401033  mov         dword ptr [esp+4],0FFFFFFFFh 
0040103B  mov         dword ptr [esp+8],384FF00h 
00401043  mov         dword ptr [esp+0Ch],offset ___xt_z+20h (4B525Ch) 
0040104B  lea         eax,[ebp-10h] 
0040104E  mov         dword ptr [esp+10h],eax 
00401052  call        _for_write_seq_lis (401140h) 
00401057  add         esp,14h 
> inCommonVar is "_MYCOM"
> but when not 1st common variable you would see "_MYCOM+someNumberHere"
0040105A  fld         dword ptr [_MYCOM (4DEC50h)] 
00401060  fstp        dword ptr [ebp-0Ch] 
00401063  add         esp,0FFFFFFF4h 
00401066  lea         eax,[ebp-34h] 
00401069  mov         dword ptr [esp],eax 
0040106C  mov         dword ptr [esp+4],offset ___xt_z+28h (4B5264h) 
00401074  lea         eax,[ebp-0Ch] 
00401077  mov         dword ptr [esp+8],eax 
0040107B  call        _for_write_seq_lis_xmit (402950h) 
00401080  add         esp,0Ch 
> onStackVar are as-is "ONSTACKVAR"
00401083  fld         dword ptr [ONSTACKVAR] 
00401086  fstp        dword ptr [ebp-8] 
00401089  add         esp,0FFFFFFF4h 
0040108C  lea         eax,[ebp-34h] 
0040108F  mov         dword ptr [esp],eax 
00401092  mov         dword ptr [esp+4],offset ___xt_z+30h (4B526Ch) 
0040109A  lea         eax,[ebp-8] 
0040109D  mov         dword ptr [esp+8],eax 
004010A1  call        _for_write_seq_lis_xmit (402950h) 
004010A6  add         esp,0Ch 
> dummy argumnets are as-is "REFERENCEVAR"
004010A9  mov         eax,dword ptr [REFERENCEVAR] 
004010AC  fld         dword ptr [eax] 
004010AE  fstp        dword ptr [ebp-4] 
004010B1  add         esp,0FFFFFFF4h 
004010B4  lea         eax,[ebp-34h] 
004010B7  mov         dword ptr [esp],eax 
004010BA  mov         dword ptr [esp+4],offset ___xt_z+38h (4B5274h) 
004010C2  lea         eax,[ebp-4] 
004010C5  mov         dword ptr [esp+8],eax 
004010C9  call        _for_write_seq_lis_xmit (402950h) 
004010CE  add         esp,0Ch 

[/cpp]

The 64-bit code will look quite different, but the symbolic information for your variables willhave descernable patters.

There are two things for you to check

1) Are the symbolic names for Tmin and Tmax the same for your write statement as for your call statement

2) Prior to beginning of code for write statement record the value of ESP (or RSP). this is your stack pointer. As you see from the above code, the single write statement was broken up into multiplecalls to _for_write_seq_lis_xmit following this is a stack fix-up "add esp,0Ch" for 64-bit the code will be a bit different, I am sure you will have no problem in figuring it out. After the stack fixup following the call to write recheck ESP or RSP - it should be the value you recorded prior to the call.

3) Step 2) was more of an exercize, as I am sure that the WRITE will not messup the stack. Perform the step 2) technique on the subroutine call causing the error.

Things to note

a) on return from call, and after stack fixup is ESP (or RSP) correct?
b) does the text in the dissassembly look the same? The re-dissassembled code showing different variables or expressions relative to resisters ",[ebp-34h]" in place of what used to hold symbolic names. If the names have changed, then this might indicate the frame pointer was not restored properly. This is EBP in 32-bit and RBP in 64-bit.

Also note, if the problem can occure at the code your are looking at, it may also have occured _prior_ to the code you are looking at. An indication of that is the dissassembled symbolic information does not align with the expression of the source statement (WRITE or CALL as the case may be.

This is to say, the "Things to note", b) following step 3) should be noted prior to step 1) above.

I couldn't tell you what to look for at step 1) before walking you through to step 3) b)

Good luck hunting for the bug.

Jim Dempsey

jirina · ‎02-16-2009

Jim,

I am grateful for everything you are doing to help me. Your detailed explanation should be OK for me to understand what to do; however, I am not able to do the first step. I have my Release version, I enabled debug information (/debug:full) and rebuilt the project. When I try to start debugging, I get the error message "Debugging information cannot be found or does not match. Binary was not built with debug information". When I continue, the execution does not stop at breakpoint which is obviously an expected consequence of the error message.

I checked the documentation and it seems that it should be possible to compile an application with both /debug:full and /O2 enabled. Or is there another compiler option which I don't see and which needs to be changed?

I am sorry for keeping asking (probably) stupid questions.

jimdempseyatthecove · ‎02-16-2009

The linker must be told to keep the debug information - sorry I didn't mention this.

You can also select your Debug configuration and then in the project(s) select the optimization to maximum speed (or whatever). Toggle-ing the debug symbols on/off on Release configuration will be better as it will keep all the optimization switches the same. e.g. you may be using /O3 in most files but use /O2, /O1, /O0, etc.. in other files. Keeping optimization switches identical is paramount in trying to track down intermittent problems.

The code you have shown, should not be exhibiting the problems described, _provided_ you haven't made a programming error. The preponderance of problems are embarrassingly programmer errors as opposed to compiler errors, but their are the occasional compiler errors.

Jim Dempsey

jirina · ‎02-17-2009

Jim,

I followed your advice and I have following conclusions from my tests:

1. The symbolic names for Tmin and Tmax are correct in the write statement before the parallel region. However, they are wrong in case of the subroutine call. Instead of the symbolic names, I see this in the disassembly (RXBC is another real*8 local variable which seems to be OK):

[cpp]007B7BC4  lea         eax,[RXBC] 
007B7BCA  mov         dword ptr [esp+94h],eax 
007B7BD1  lea         eax,[ebp-0DFCh] 
007B7BD7  mov         dword ptr [esp+98h],eax 
007B7BDE  lea         eax,[ebp-0DF4h] 
007B7BE4  mov         dword ptr [esp+9Ch],eax[/cpp]

Also, when I am debugging and place the mouse pointer over a local variable I can see its value. This does not work with Tmin and Tmax.

2. The write statement before the parallel region does not messup the stack - the value of ESP is the same before and after the write statement.

3. However, ESP value is different before and after the subroutine call.

You mentioned that the problem might have occurred prior to the location I am looking at. I am going to check this, but I am afraid this might be difficult because of chenges introduced to the optimized code. It looks strange to see the subroutine call in the disassembly to be "interrupted" by a code preceding the call (e.g. parts of the parallel region definition or some lines from between the parallel region definition and the subroutine call). Is this normal or an indication of a bug in my code?

In addition, I manually and several times checked whether the number and type of subroutine arguments is the same in the declaration and in the call - this seems to be OK.

Thank you for your patient help.

jimdempseyatthecove · ‎02-17-2009

jirina,

Try following changes note comments

[cpp]              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax   
  
! use "!" instead of "*"
!dec$ if defined (_OPENMP_) 
! place "&" at end of next line for continuation, remove "&" from line following next line  
!$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )   &
!$omp  firstprivate ( ..., Tmin, Tmax, ... )   
  
!$omp do schedule(dynamic,3)   
!dec$ end if  
            do ij = 1,(i2-i1+1)*(j2-j1+1)   
               
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax   
  
                 call apply_BC ( ..., Tmin, Tmax, ... )  
[/cpp]

Jim

jirina · ‎02-17-2009

I am using the fixed form and OpenMP documentation says:

# !$OMP C$OMP *$OMP are accepted sentinels and must start in column 1
# All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the entire directive line
# Initial directive lines must have a space/zero in column 6.
# Continuation lines must have a non-space/zero in column 6.

I replaced *dec$ by !dec$, *$omp by !$omp, but I can't do more - putting & at the end of first line and removing it from column 6 of the next line results in compiler error message
"error #5082: Syntax error, found '&' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION COPYIN NUM_THREADS SHARED IF DEFAULT , ..."

Replacing * by ! did not help, the problem/bug is still there.

jimdempseyatthecove · ‎02-17-2009

[cpp]*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
         012345678901234567890123456789012345678901234567890123456789012345678901234567890
[/cpp]

Fixed form ends at column 72 unless you have extended source selected (then at col 132)

Using conditional compilation (so as to not muck up anyting)

Try a quick test by removing "if ( enableOpenMP .AND. omp_bc ) num_threads ( threads )" and concatinating the omp clause from the following line. Check to assure the resultant line is less than 73 chars.

If this fixes the problem then there is a syntax problem with the omp statements.

Jim Dempsey

gib · ‎02-17-2009

Quoting - jimdempseyatthecove

[cpp]*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
         012345678901234567890123456789012345678901234567890123456789012345678901234567890
[/cpp]

Fixed form ends at column 72 unless you have extended source selected (then at col 132)

If the OP is using the fixed form width of 72 then he is very unlucky that column 72 happens to contain a space, implying that the 'default ( shared )' clause was chopped off without generating a compile error. Another reason to use free form ...

jirina · ‎02-18-2009

Quoting - jimdempseyatthecove

[cpp]*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
         012345678901234567890123456789012345678901234567890123456789012345678901234567890
[/cpp]

Fixed form ends at column 72 unless you have extended source selected (then at col 132)

Using conditional compilation (so as to not muck up anyting)

Try a quick test by removing "if ( enableOpenMP .AND. omp_bc ) num_threads ( threads )" and concatinating the omp clause from the following line. Check to assure the resultant line is less than 73 chars.

If this fixes the problem then there is a syntax problem with the omp statements.

Jim Dempsey

I am using fixed form, but extended to 132 characters per line (/fixed /extend_source:132).

I cannot have just one line with !$OMP, because there is a big list of variables in FIRSTPRIVATE (about 55 variables).

Anyway, I changed this particular .for file from the fixed form to the free form, but unfortunately, the problem with Tmin and Tmax is still the same. This might mean that continuation lines of the omp clause are not causing the problem.

jimdempseyatthecove · ‎02-18-2009

Here is a potential work around

Save the FIRTSTPRIVATE clauses as comments in the program. Place above the !$OMP PARALLEL...
Add LOGICAL :: InitOnce as subroutine local variable

In front of !$OMP PARALLEL...

Add InitOnce = .true.

Replace what used to have been the FIRSTPRIVATE clause with

FIRTSTPRIVATE(InitOnce)

Inside the body of the loop, at top of loop,add

if(InitOnce) then
InitOnce = .false.
localTmin = Tmin
localTmax = Tmax
...
endif
call foo(..., localTmin, localTmax, ...)

You can conditionalize this code if you wish.

Not clean but it should work

Jim Dempsey

jirina · ‎02-18-2009

I understand your point, but I think it won't work - call foo is in a parallel region in my case and it uses many variables which (I believe) must be defined as private or firstprivate.

Anyway, I updated my code from the post starting this discussion to read:

[cpp]  write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

  initOnce = .true. ! NEW

!omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )  
!$omp&  firstprivate ( ..., Tmin, Tmax, ..., initOnce )
!$omp&  private ( localTmin, localTmax ) ! NEW

!$omp do schedule(dynamic,3)
  do ij = 1,(i2-i1+1)*(j2-j1+1)
    if ( initOnce ) then ! NEW
      initOnce = .false.
      localTmin = Tmin
      localTmax = Tmax
    endif
    write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax  
    call apply_BC ( ..., localTmin, localTmax, ... ) ! NEW instead of ( ..., Tmin, Tmax, ... )  [/cpp]

And this change helped! I can see correct values of Tmin and Tmax everywhere, even inside the parallel region and inside the subroutine apply_BC.

I know that this solution is not so clean, but I am happy it helped. Thank you very much, Jim, for your effort.

Anyway, I was trying to work with my code to find an eventual bug, but I did not find anything. There are 3 parts of the source code which are basically the same (each part corresponds to one coordinate direction X, Y, Z) - I checked it several times today. Still, two of them worked well in the parallel model and just one became a nightmare for me and probably also for you and other people trying to help me.

I am considering submitting this to Intel as a possible compiler bug, but I am afraid to do so, because the source code is not very nice (originally written in Fortran 77 by a non-programmer) and because there might be a bug which I don't see even though I have been trying to find it for more than one week.

jimdempseyatthecove · ‎02-18-2009

Jirina,

I consider myself an experienced programmer.

Most of the times (~99%) when I off handedly think "this has got to be a compiler bug", I will look at the code again and again without seeing the error. However, with luck my error is found before I give up and submit it to Premier Support. In almost all the cases, the found error is the result of lax programming or stupid mistake on my part. This goes with the business - so I am used to it by now (40 years of programming).

Also,

The problem with the Tmin and Tmax may also be a problem with (some of) the remaining FIRSTPRIVATE variables. Until you pin down what is causing the error with Tmin an Tmax I suggest you consider making localXXX's out of tall the FIRSTPRIVATES. Additionally make it so you can conditionally compile either way, and insert some ASSERT sanity checkes. Doing this will permit you to catch additional errors now, as well as try out the next release(s) of the compiler later.

Glad this gave you a work-a-round so you can put this behind you and get on about your business.

Jim Dempsey

jirina · ‎02-18-2009

I completely agree with you when it comes to bugs. I am not a real (and experienced) programmer, so it is 99% sure that I have a bug in the code. I just cannot find it.

I will try to use local versions of all FIRSTPRIVATE variables and make the code conditionally compiled to see what happens when the new compiler version is out.

I need to say thank you once more for your kind help.