Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP problem with local variables

jirina
New Contributor I
3,883 Views
I have a code which looks like this (simplified):
[cpp]              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

*dec$ if defined (_OPENMP_)
*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
*$omp&  firstprivate ( ..., Tmin, Tmax, ... )

*$omp do schedule(dynamic,3)
*dec$ end if
            do ij = 1,(i2-i1+1)*(j2-j1+1)
            
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

                 call apply_BC ( ..., Tmin, Tmax, ... )[/cpp]

Tmin and Tmax (both real*8) contain nonzero values before the parallel region is entered. If the commented line inside the parallel region is uncommented, Tmin and Tmax are the same as before the parallel region. However, letting the line commented out and calling the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.

So, it seems entering the parallel region is causing some problems. It might be a bug in my code (quite big - 2300 lines), but I have no idea what I should focus on when trying to find the cause of the problem.

Parallel debug version and a version with enableOpenMP = .false. (no parallelization) work correctly.
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
3,883 Views

Here is a potential work around

Save the FIRTSTPRIVATE clauses as comments in the program. Place above the !$OMP PARALLEL...
Add LOGICAL :: InitOnce as subroutine local variable

In front of !$OMP PARALLEL...

Add InitOnce = .true.

Replace what used to have been the FIRSTPRIVATE clause with

FIRTSTPRIVATE(InitOnce)

Inside the body of the loop, at top of loop,add

if(InitOnce) then
InitOnce = .false.
localTmin = Tmin
localTmax = Tmax
...
endif
call foo(..., localTmin, localTmax, ...)

You can conditionalize this code if you wish.

Not clean but it should work

Jim Dempsey

View solution in original post

0 Kudos
33 Replies
jimdempseyatthecove
Honored Contributor III
2,516 Views

From the looks of it you found an optimization problem. Can you ascertain what is going on from the dissassembly code?
0 Kudos
TimP
Honored Contributor III
2,516 Views
The source code is confusing enough. You make us wonder if you mis-spelled _OPENMP or intentionally defined 2 macros both of which might have somewhat similar function to the standard one, and why use the dec$ form of conditional compilation in an apparently similar sense to the cpp style which is required to be available for OpenMP.
0 Kudos
jirina
New Contributor I
2,516 Views
I am not able to work with disassembly code, but I will try to ask my colleagues to help me with it. I tried to disable optimizations for the file containing the above mentioned source code, but it did not help.

_OPENMP_ is not misspelled - I intentionally used it, because I wanted to have a code without any OpenMP directives. Maybe the name of the macro is confusing - the problem is that I did not know there is _OPENMP macro available - but I hope it does not affect anything. You are right that this is a duplicity - I will change it to _OPENMP to have the code clearer.


0 Kudos
gib
New Contributor II
2,516 Views
Have you tried running with bounds checking turned on?
0 Kudos
jirina
New Contributor I
2,516 Views
I tried taking my Release configuration, changing all run-time checks to yes (/check:pointer /check:bounds /check:uninit /check:format /check:output_conversion /check:arg_temp_created) and rebuild the project. I got a compiler error "Fatal compilation error: Out of memory asking for 8200".
Is it OK to enable these checks in the Release configuration?
Do I need to enable Traceback Information? (I think this will just tell me the location in the code where any check found an error).

I am going to try the Debug configuration with all checks enabled, although the Debug version does not experience the problem described in my original post.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,516 Views


>>the write command from the subroutine apply_BC causes that both Tmin and Tmax are suddenly 0 (generally a different value). When the program comes to the first line of my example code again, Tmin and Tmax are correct.

By this do you mean the values of Tmin and Tmax as written by WRITE (or examined in Debugger at WRITE)are different than the values inside subroutine apply_BC(..., Tmin, Tmax,...)?

If so, then I would suspect that Tmin and Tmax are not declared with the same type. In one place they are likely REAL(8) and in the other they are REAL(4).

Try using /gen-interfaces /warn:interfaces.

Jim Dempsey

0 Kudos
jirina
New Contributor I
2,516 Views
The Debug version works without any problems, even with /gen-interfaces and /warn:interfaces. I have all available checks enabled in the Debug version, but none of them is issuing any warning. This also means that Tmin and Tmax have the same value at the line with WRITE and inside the subroutine apply_BC.

In addition, I checked that Tmin and Tmax are always declared as real*8.

When I try Release version, I can see correct values at WRITE, but a parallel region is entered after it and the same WRITE statement in the subroutine apply_BC inside the parallel block shows different values (0).

I tried enabling various checks in the Release version (it might be nonsense), but I am not able to compile the project - the compiler is trying to allocate too much memory (I saw 2 GB in the task manager) and then aborts.

There is one commented line in my original post. After I uncomment it, Tmin and Tmax contain correct values, but I noticed that some other variables suddenly contain wrong values (0).

So, I continue searching for the possible cause of this problem. Up to now, thank you all for your suggestions and ideas.
0 Kudos
rreis
New Contributor I
2,516 Views
Quoting - jirina
I tried taking my Release configuration, changing all run-time checks to yes (/check:pointer /check:bounds /check:uninit /check:format /check:output_conversion /check:arg_temp_created) and rebuild the project. I got a compiler error "Fatal compilation error: Out of memory asking for 8200".
Is it OK to enable these checks in the Release configuration?
Do I need to enable Traceback Information? (I think this will just tell me the location in the code where any check found an error).

I am going to try the Debug configuration with all checks enabled, although the Debug version does not experience the problem described in my original post.

Enabling the checks will hurt performance. Off course you can do it but I don't think it's advisable in the production/release code... I mean, when you reach that stage is because it has been proven that in regular operation that kind of error won't happen. Putting it in another way: I think it would be OK to enable them in the Release for debug porpuses and not for releasing the Release...
0 Kudos
rreis
New Contributor I
2,516 Views
can you post all the flags you are using for the Release version?
0 Kudos
jirina
New Contributor I
2,516 Views
Sure, I would not release the Release version with all the checks enabled. I just wanted to see whether any of them gives me a clue where the problem comes from.

Anyway, the whole project is compiled with following options:


/nologo /D_OPENMP_ /fixed /extend_source:132 /Qopenmp /fpscomp:general /warn:declarations /warn:unused /assume:byterecl /module:"Release" /object:"Release" /libs:static /threads /c /align:all /heap-arrays

and linked with following settings:

/OUT:"Releasename1.exe" /NOLOGO /DELAYLOAD:"name2.dll" /MANIFEST /MANIFESTFILE:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"name3.lib" delayimp.lib libguide.lib
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,516 Views

>>There is one commented line in my original post. After I uncomment it, Tmin and Tmax contain correct values, but I noticed that some other variables suddenly contain wrong values (0).

This is usually an indication of a calling convention error where the stack pointer is not cleaned up properly after a call. These types of errors often occure in mixed language programs. Is your application a mixture of Fortran and something else? (e.g. C++)

Jim Dempsey
0 Kudos
gib
New Contributor II
2,516 Views
Quoting - jirina
Sure, I would not release the Release version with all the checks enabled. I just wanted to see whether any of them gives me a clue where the problem comes from.

Anyway, the whole project is compiled with following options:


/nologo /D_OPENMP_ /fixed /extend_source:132 /Qopenmp /fpscomp:general /warn:declarations /warn:unused /assume:byterecl /module:"Release" /object:"Release" /libs:static /threads /c /align:all /heap-arrays

and linked with following settings:

/OUT:"Releasename1.exe" /NOLOGO /DELAYLOAD:"name2.dll" /MANIFEST /MANIFESTFILE:"..." /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"name3.lib" delayimp.lib libguide.lib
Why are you requesting such an enormous stack? Could this be the reason your release build fails?
0 Kudos
jirina
New Contributor I
2,516 Views
Quoting - gib
Why are you requesting such an enormous stack? Could this be the reason your release build fails?
My application is a CFD (Computational Fluid Dynamics) solver which typically uses about 250 MB of memory. When I tried to run it parallelly, it was crashing, so I gradually increased the Stack Reserve Size until the application stopped crashing.

Anyway, the Release build is compiled and linked without any problems if no run-time checks are enabled.
0 Kudos
jirina
New Contributor I
2,516 Views

>>There is one commented line in my original post. After I uncomment it, Tmin and Tmax contain correct values, but I noticed that some other variables suddenly contain wrong values (0).

This is usually an indication of a calling convention error where the stack pointer is not cleaned up properly after a call. These types of errors often occure in mixed language programs. Is your application a mixture of Fortran and something else? (e.g. C++)

Jim Dempsey
My application is written in Fortran. I am using one (delay-loaded) DLL written in C++ which is used to:
- Write to Windows event log - This is disabled in all of my tests.
- Handle signals (Ctrl+C, etc.) - I know SIGNALQQ can be used for this, but I needed additional functionality (e.g. handle console window closing).

Anyway, I removed all usages of the DLL from my project, I removed the DLL from project settings (delay-loaded DLL) and I even tried decreasing the stack reserve size to 0 (see my other post). The problem still occurs.

Thank you that you keep giving me ideas.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,516 Views

See if the following works

!$OMP PARALLEL DO
DO I=1,COUNT
!$OMP SINGLE
your code here
!$OMP END SINGLE
END DO
!$OMP END PARALLEL DO

The above is an outline, work into your code.

If the above works, then you may have a problem with private/shared (likely an oversight or typographical error).

By moving the SINGLE/END SINGLE about you might be able to isolate the error.

Jim Dempsey
0 Kudos
jirina
New Contributor I
2,516 Views
This is a good idea, but I unfortunately cannot use SINGLE like that. I am getting an error message "error #7917: The workshare construct SINGLE or SECTIONS is invalid in a PARALLELDO which must contain a single DO directive". My sample code looks like this:
[cpp]*$omp parallel do default ( shared ) private ( ij, j ) reduction ( +: i )
          do ij = 1,100
*$omp single
            j = j+1
*$omp end single
            i = i+1
          end do
*$omp end parallel do[/cpp]
I read the documentation for SINGLE, but I did not find any reason why it should not work in my example.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,516 Views

OOPs my mistake
I ment to say

!$OMP CRITICAL
...
!$OMP END CRITICAL

lit one thread through at a time
If n threads run that way (one at a time through the critical section) then you can assume you have a shared variable problem.

Sorry about the faux pas

Jim
0 Kudos
jirina
New Contributor I
2,516 Views
No problem with CRITICAL vs. SINGLE, it was my fault not to read the documentation thoroughly. CRITICAL compiles well.

Anyway, going back to my original code and adding critial, I have:
[cpp]              write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

*$omp parallel if ( enableOpenMP .AND. omp_bc ) num_threads ( threads ) default ( shared )
*$omp&  firstprivate ( ..., Tmin, Tmax, ... )

*$omp do schedule(dynamic,3)
            do ij = 1,(i2-i1+1)*(j2-j1+1)
*$omp CRITICAL ! added 2009-02-24
!                write(*,'(2(a,f8.2))') ' Min = ', Tmin, ' Max = ', Tmax

                 call apply_BC ( ..., Tmin, Tmax, ... )
                 ...
*$omp END CRITICAL
            end do
*$omp end do
*$omp end parallel
[/cpp]
Even after putting the whole contents of the parallel do region into a critical section, the write statement inside apply_BC shows incorrect values of Tmin and Tmax.

Does this mean I should go through all variables and see which of them are (first)private and which of them are shared? Did you mean that there might be a problem with some of the shared variables?

PS: I tried the latest version of Intel Visual Fortran (11.0.072), but it did not solve the problem. At least, my code compiles much faster with the new version of IVF. :-)
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,516 Views

Second attempt at writing this (damb edit window on this brain dead forum when you type Tab. Tab should indent while in edit box and not go off (out of the edit window) and do some other functions)

Place a break on the write statement.

At break, open a Dissassembly window.

What is being passed for Tmin and Tmax to the write?
What is being passed for Tmin and Tmax to apply_BC?

Jim Dempsey
0 Kudos
jirina
New Contributor I
2,399 Views
I am sorry, I can't work with Disassembly. Anyway, I might be wrong, but I assume the Disassembly window is supposed to be used in Debug version. And the key thing is that the Debug version works well (Tmin and Tmax have correct values in both screen output after the write statement and in the Watch window when breakpoint is placed at locations suggested by you).

I tried to check IVF Help and this forum for more information about Disassembly, but everything I found was related to debugging. Nevertheless, am I missing something what Disassembly can tell me more than the Watch window?

Or is there any way of using Disassembly even in the Release version (which might need some adjustments of compiler and linker options)? If yes, I would try learning reading the disassembly code...
0 Kudos
Reply