- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a code which looks like this (simplified):
Tmin and Tmax (both real*8) contain nonzero values before the parallel region is entered. If the commented line inside the parallel region is uncommented, Tmin and Tmax are the same as before the parallel region. However, letting the line commented out and calling the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.
So, it seems entering the parallel region is causing some problems. It might be a bug in my code (quite big - 2300 lines), but I have no idea what I should focus on when trying to find the cause of the problem.
Parallel debug version and a version with enableOpenMP = .false. (no parallelization) work correctly.
[cpp] write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax *dec$ if defined (_OPENMP_) *$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared ) *$omp& firstprivate ( ..., Tmin, Tmax, ... ) *$omp do schedule(dynamic,3) *dec$ end if do ij = 1,(i2-i1+1)*(j2-j1+1) ! write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax call apply_BC ( ..., Tmin, Tmax, ... )[/cpp]
Tmin and Tmax (both real*8) contain nonzero values before the parallel region is entered. If the commented line inside the parallel region is uncommented, Tmin and Tmax are the same as before the parallel region. However, letting the line commented out and calling the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.
So, it seems entering the parallel region is causing some problems. It might be a bug in my code (quite big - 2300 lines), but I have no idea what I should focus on when trying to find the cause of the problem.
Parallel debug version and a version with enableOpenMP = .false. (no parallelization) work correctly.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a potential work around
Save the FIRTSTPRIVATE clauses as comments in the program. Place above the !$OMP PARALLEL...
Add LOGICAL :: InitOnce as subroutine local variable
In front of !$OMP PARALLEL...
Add InitOnce = .true.
Replace what used to have been the FIRSTPRIVATE clause with
FIRTSTPRIVATE(InitOnce)
Inside the body of the loop, at top of loop,add
if(InitOnce) then
InitOnce = .false.
localTmin = Tmin
localTmax = Tmax
...
endif
call foo(..., localTmin, localTmax, ...)
You can conditionalize this code if you wish.
Not clean but it should work
Jim Dempsey
Link Copied
- « Previous
-
- 1
- 2
- Next »
33 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jirina,
You can run the debugger on release code. Compile the release code with debugger symbol generation enabledas the only difference in your options (rember to turn this off before you ship code to users).
Debugging a release code version is particularly difficult as the optimization process moves, changes and/or eliminates code.
Fortunately, you are not using the debugger in a manner that matters with respect to this debugging difficulty. You will mearly be using the debugger at one break point to look at the dissassembly code around the break point.
[cpp]-------------- source ------------- ! ASDF.F90 module inModule real :: inModVar end module inModule program ASDF use inModule real :: inCommonVar common /mycom/ inCommonVar inModVar = 123. inCommonVar =456. onStackVar = 789. call foo(1122.33) end program ASDF subroutine foo(ReferenceVar) use inModule real :: inCommonVar common /mycom/ inCommonVar real, automatic :: onStackVar onStackVar = inModVar + inCommonVar write(*,*) inModVar, inCommonVar, onStackVar, ReferenceVar onStackVar = onStackVar + ReferenceVar end subroutine foo --------- end source -------------- ---- dissassembly for write ---- ---- Note, this is 32-bit code, 64-bit code will look different ---- ---- My Annotations are lines starting with > > source write statement write(*,*) inModVar, inCommonVar, onStackVar, ReferenceVar 0040101D mov dword ptr [ebp-34h],0 > inModVar is "_INMODULE_mp_INMODVAR" 00401024 fld dword ptr [_INMODULE_mp_INMODVAR (4DEC60h)] 0040102A fstp dword ptr [ebp-10h] 0040102D lea eax,[ebp-34h] 00401030 mov dword ptr [esp],eax 00401033 mov dword ptr [esp+4],0FFFFFFFFh 0040103B mov dword ptr [esp+8],384FF00h 00401043 mov dword ptr [esp+0Ch],offset ___xt_z+20h (4B525Ch) 0040104B lea eax,[ebp-10h] 0040104E mov dword ptr [esp+10h],eax 00401052 call _for_write_seq_lis (401140h) 00401057 add esp,14h > inCommonVar is "_MYCOM" > but when not 1st common variable you would see "_MYCOM+someNumberHere" 0040105A fld dword ptr [_MYCOM (4DEC50h)] 00401060 fstp dword ptr [ebp-0Ch] 00401063 add esp,0FFFFFFF4h 00401066 lea eax,[ebp-34h] 00401069 mov dword ptr [esp],eax 0040106C mov dword ptr [esp+4],offset ___xt_z+28h (4B5264h) 00401074 lea eax,[ebp-0Ch] 00401077 mov dword ptr [esp+8],eax 0040107B call _for_write_seq_lis_xmit (402950h) 00401080 add esp,0Ch > onStackVar are as-is "ONSTACKVAR" 00401083 fld dword ptr [ONSTACKVAR] 00401086 fstp dword ptr [ebp-8] 00401089 add esp,0FFFFFFF4h 0040108C lea eax,[ebp-34h] 0040108F mov dword ptr [esp],eax 00401092 mov dword ptr [esp+4],offset ___xt_z+30h (4B526Ch) 0040109A lea eax,[ebp-8] 0040109D mov dword ptr [esp+8],eax 004010A1 call _for_write_seq_lis_xmit (402950h) 004010A6 add esp,0Ch > dummy argumnets are as-is "REFERENCEVAR" 004010A9 mov eax,dword ptr [REFERENCEVAR] 004010AC fld dword ptr [eax] 004010AE fstp dword ptr [ebp-4] 004010B1 add esp,0FFFFFFF4h 004010B4 lea eax,[ebp-34h] 004010B7 mov dword ptr [esp],eax 004010BA mov dword ptr [esp+4],offset ___xt_z+38h (4B5274h) 004010C2 lea eax,[ebp-4] 004010C5 mov dword ptr [esp+8],eax 004010C9 call _for_write_seq_lis_xmit (402950h) 004010CE add esp,0Ch [/cpp]
The 64-bit code will look quite different, but the symbolic information for your variables willhave descernable patters.
There are two things for you to check
1) Are the symbolic names for Tmin and Tmax the same for your write statement as for your call statement
2) Prior to beginning of code for write statement record the value of ESP (or RSP). this is your stack pointer. As you see from the above code, the single write statement was broken up into multiplecalls to _for_write_seq_lis_xmit following this is a stack fix-up "add esp,0Ch" for 64-bit the code will be a bit different, I am sure you will have no problem in figuring it out. After the stack fixup following the call to write recheck ESP or RSP - it should be the value you recorded prior to the call.
3) Step 2) was more of an exercize, as I am sure that the WRITE will not messup the stack. Perform the step 2) technique on the subroutine call causing the error.
Things to note
a) on return from call, and after stack fixup is ESP (or RSP) correct?
b) does the text in the dissassembly look the same? The re-dissassembled code showing different variables or expressions relative to resisters ",[ebp-34h]" in place of what used to hold symbolic names. If the names have changed, then this might indicate the frame pointer was not restored properly. This is EBP in 32-bit and RBP in 64-bit.
Also note, if the problem can occure at the code your are looking at, it may also have occured _prior_ to the code you are looking at. An indication of that is the dissassembled symbolic information does not align with the expression of the source statement (WRITE or CALL as the case may be.
This is to say, the "Things to note", b) following step 3) should be noted prior to step 1) above.
I couldn't tell you what to look for at step 1) before walking you through to step 3) b)
Good luck hunting for the bug.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
I am grateful for everything you are doing to help me. Your detailed explanation should be OK for me to understand what to do; however, I am not able to do the first step. I have my Release version, I enabled debug information (/debug:full) and rebuilt the project. When I try to start debugging, I get the error message "Debugging information cannot be found or does not match. Binary was not built with debug information". When I continue, the execution does not stop at breakpoint which is obviously an expected consequence of the error message.
I checked the documentation and it seems that it should be possible to compile an application with both /debug:full and /O2 enabled. Or is there another compiler option which I don't see and which needs to be changed?
I am sorry for keeping asking (probably) stupid questions.
I am grateful for everything you are doing to help me. Your detailed explanation should be OK for me to understand what to do; however, I am not able to do the first step. I have my Release version, I enabled debug information (/debug:full) and rebuilt the project. When I try to start debugging, I get the error message "Debugging information cannot be found or does not match. Binary was not built with debug information". When I continue, the execution does not stop at breakpoint which is obviously an expected consequence of the error message.
I checked the documentation and it seems that it should be possible to compile an application with both /debug:full and /O2 enabled. Or is there another compiler option which I don't see and which needs to be changed?
I am sorry for keeping asking (probably) stupid questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The linker must be told to keep the debug information - sorry I didn't mention this.
You can also select your Debug configuration and then in the project(s) select the optimization to maximum speed (or whatever). Toggle-ing the debug symbols on/off on Release configuration will be better as it will keep all the optimization switches the same. e.g. you may be using /O3 in most files but use /O2, /O1, /O0, etc.. in other files. Keeping optimization switches identical is paramount in trying to track down intermittent problems.
The code you have shown, should not be exhibiting the problems described, _provided_ you haven't made a programming error. The preponderance of problems are embarrassingly programmer errors as opposed to compiler errors, but their are the occasional compiler errors.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
I followed your advice and I have following conclusions from my tests:
1. The symbolic names for Tmin and Tmax are correct in the write statement before the parallel region. However, they are wrong in case of the subroutine call. Instead of the symbolic names, I see this in the disassembly (RXBC is another real*8 local variable which seems to be OK):
2. The write statement before the parallel region does not messup the stack - the value of ESP is the same before and after the write statement.
3. However, ESP value is different before and after the subroutine call.
You mentioned that the problem might have occurred prior to the location I am looking at. I am going to check this, but I am afraid this might be difficult because of chenges introduced to the optimized code. It looks strange to see the subroutine call in the disassembly to be "interrupted" by a code preceding the call (e.g. parts of the parallel region definition or some lines from between the parallel region definition and the subroutine call). Is this normal or an indication of a bug in my code?
In addition, I manually and several times checked whether the number and type of subroutine arguments is the same in the declaration and in the call - this seems to be OK.
Thank you for your patient help.
I followed your advice and I have following conclusions from my tests:
1. The symbolic names for Tmin and Tmax are correct in the write statement before the parallel region. However, they are wrong in case of the subroutine call. Instead of the symbolic names, I see this in the disassembly (RXBC is another real*8 local variable which seems to be OK):
[cpp]007B7BC4 lea eax,[RXBC]Also, when I am debugging and place the mouse pointer over a local variable I can see its value. This does not work with Tmin and Tmax.
007B7BCA mov dword ptr [esp+94h],eax
007B7BD1 lea eax,[ebp-0DFCh]
007B7BD7 mov dword ptr [esp+98h],eax
007B7BDE lea eax,[ebp-0DF4h]
007B7BE4 mov dword ptr [esp+9Ch],eax[/cpp]
2. The write statement before the parallel region does not messup the stack - the value of ESP is the same before and after the write statement.
3. However, ESP value is different before and after the subroutine call.
You mentioned that the problem might have occurred prior to the location I am looking at. I am going to check this, but I am afraid this might be difficult because of chenges introduced to the optimized code. It looks strange to see the subroutine call in the disassembly to be "interrupted" by a code preceding the call (e.g. parts of the parallel region definition or some lines from between the parallel region definition and the subroutine call). Is this normal or an indication of a bug in my code?
In addition, I manually and several times checked whether the number and type of subroutine arguments is the same in the declaration and in the call - this seems to be OK.
Thank you for your patient help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jirina,
Try following changes note comments
[cpp] write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax ! use "!" instead of "*" !dec$ if defined (_OPENMP_) ! place "&" at end of next line for continuation, remove "&" from line following next line !$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared ) & !$omp firstprivate ( ..., Tmin, Tmax, ... ) !$omp do schedule(dynamic,3) !dec$ end if do ij = 1,(i2-i1+1)*(j2-j1+1) ! write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax call apply_BC ( ..., Tmin, Tmax, ... ) [/cpp]
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using the fixed form and OpenMP documentation says:
# !$OMP C$OMP *$OMP are accepted sentinels and must start in column 1
# All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the entire directive line
# Initial directive lines must have a space/zero in column 6.
# Continuation lines must have a non-space/zero in column 6.
I replaced *dec$ by !dec$, *$omp by !$omp, but I can't do more - putting & at the end of first line and removing it from column 6 of the next line results in compiler error message
"error #5082: Syntax error, found '&' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION COPYIN NUM_THREADS SHARED IF DEFAULT , ..."
Replacing * by ! did not help, the problem/bug is still there.
# !$OMP C$OMP *$OMP are accepted sentinels and must start in column 1
# All Fortran fixed form rules for line length, white space, continuation and comment columns apply for the entire directive line
# Initial directive lines must have a space/zero in column 6.
# Continuation lines must have a non-space/zero in column 6.
I replaced *dec$ by !dec$, *$omp by !$omp, but I can't do more - putting & at the end of first line and removing it from column 6 of the next line results in compiler error message
"error #5082: Syntax error, found '&' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION COPYIN NUM_THREADS SHARED IF DEFAULT , ..."
Replacing * by ! did not help, the problem/bug is still there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared ) 123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889 012345678901234567890123456789012345678901234567890123456789012345678901234567890 [/cpp]
Fixed form ends at column 72 unless you have extended source selected (then at col 132)
Using conditional compilation (so as to not muck up anyting)
Try a quick test by removing "if ( enableOpenMP .AND. omp_bc ) num_threads ( threads )" and concatinating the omp clause from the following line. Check to assure the resultant line is less than 73 chars.
If this fixes the problem then there is a syntax problem with the omp statements.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
[cpp]*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
012345678901234567890123456789012345678901234567890123456789012345678901234567890
[/cpp]
Fixed form ends at column 72 unless you have extended source selected (then at col 132)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
[cpp]*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
123456789111111111122222222223333333333444444444455555555556666666666777777777788888888889
012345678901234567890123456789012345678901234567890123456789012345678901234567890
[/cpp]
Fixed form ends at column 72 unless you have extended source selected (then at col 132)
Using conditional compilation (so as to not muck up anyting)
Try a quick test by removing "if ( enableOpenMP .AND. omp_bc ) num_threads ( threads )" and concatinating the omp clause from the following line. Check to assure the resultant line is less than 73 chars.
If this fixes the problem then there is a syntax problem with the omp statements.
Jim Dempsey
I cannot have just one line with !$OMP, because there is a big list of variables in FIRSTPRIVATE (about 55 variables).
Anyway, I changed this particular .for file from the fixed form to the free form, but unfortunately, the problem with Tmin and Tmax is still the same. This might mean that continuation lines of the omp clause are not causing the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a potential work around
Save the FIRTSTPRIVATE clauses as comments in the program. Place above the !$OMP PARALLEL...
Add LOGICAL :: InitOnce as subroutine local variable
In front of !$OMP PARALLEL...
Add InitOnce = .true.
Replace what used to have been the FIRSTPRIVATE clause with
FIRTSTPRIVATE(InitOnce)
Inside the body of the loop, at top of loop,add
if(InitOnce) then
InitOnce = .false.
localTmin = Tmin
localTmax = Tmax
...
endif
call foo(..., localTmin, localTmax, ...)
You can conditionalize this code if you wish.
Not clean but it should work
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I understand your point, but I think it won't work - call foo is in a parallel region in my case and it uses many variables which (I believe) must be defined as private or firstprivate.
Anyway, I updated my code from the post starting this discussion to read:
I know that this solution is not so clean, but I am happy it helped. Thank you very much, Jim, for your effort.
Anyway, I was trying to work with my code to find an eventual bug, but I did not find anything. There are 3 parts of the source code which are basically the same (each part corresponds to one coordinate direction X, Y, Z) - I checked it several times today. Still, two of them worked well in the parallel model and just one became a nightmare for me and probably also for you and other people trying to help me.
I am considering submitting this to Intel as a possible compiler bug, but I am afraid to do so, because the source code is not very nice (originally written in Fortran 77 by a non-programmer) and because there might be a bug which I don't see even though I have been trying to find it for more than one week.
Anyway, I updated my code from the post starting this discussion to read:
[cpp] write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax initOnce = .true. ! NEW !omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared ) !$omp& firstprivate ( ..., Tmin, Tmax, ..., initOnce ) !$omp& private ( localTmin, localTmax ) ! NEW !$omp do schedule(dynamic,3) do ij = 1,(i2-i1+1)*(j2-j1+1) if ( initOnce ) then ! NEW initOnce = .false. localTmin = Tmin localTmax = Tmax endif write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax call apply_BC ( ..., localTmin, localTmax, ... ) ! NEW instead of ( ..., Tmin, Tmax, ... ) [/cpp]And this change helped! I can see correct values of Tmin and Tmax everywhere, even inside the parallel region and inside the subroutine apply_BC.
I know that this solution is not so clean, but I am happy it helped. Thank you very much, Jim, for your effort.
Anyway, I was trying to work with my code to find an eventual bug, but I did not find anything. There are 3 parts of the source code which are basically the same (each part corresponds to one coordinate direction X, Y, Z) - I checked it several times today. Still, two of them worked well in the parallel model and just one became a nightmare for me and probably also for you and other people trying to help me.
I am considering submitting this to Intel as a possible compiler bug, but I am afraid to do so, because the source code is not very nice (originally written in Fortran 77 by a non-programmer) and because there might be a bug which I don't see even though I have been trying to find it for more than one week.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jirina,
I consider myself an experienced programmer.
Most of the times (~99%) when I off handedly think "this has got to be a compiler bug", I will look at the code again and again without seeing the error. However, with luck my error is found before I give up and submit it to Premier Support. In almost all the cases, the found error is the result of lax programming or stupid mistake on my part. This goes with the business - so I am used to it by now (40 years of programming).
Also,
The problem with the Tmin and Tmax may also be a problem with (some of) the remaining FIRSTPRIVATE variables. Until you pin down what is causing the error with Tmin an Tmax I suggest you consider making localXXX's out of tall the FIRSTPRIVATES. Additionally make it so you can conditionally compile either way, and insert some ASSERT sanity checkes. Doing this will permit you to catch additional errors now, as well as try out the next release(s) of the compiler later.
Glad this gave you a work-a-round so you can put this behind you and get on about your business.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I completely agree with you when it comes to bugs. I am not a real (and experienced) programmer, so it is 99% sure that I have a bug in the code. I just cannot find it.
I will try to use local versions of all FIRSTPRIVATE variables and make the code conditionally compiled to see what happens when the new compiler version is out.
I need to say thank you once more for your kind help.
I will try to use local versions of all FIRSTPRIVATE variables and make the code conditionally compiled to see what happens when the new compiler version is out.
I need to say thank you once more for your kind help.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »