Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29277 Discussions

Runtime error when using OpenMP and compiling for 32 bits

Jauch
Beginner
3,845 Views

Hello, I had this problem in a previous version of Intel Fortran and in the Intel Fortran XE 2013 sp 3 also.
I'm using Visual Studio 2008.

The problem happens only when I use OpenMP and compile the code as 32 bits. The 64bits version do not causes any error. Also, If i turnoff OpenMP only in the module where I have the code that causes the error, despite the fact that there are no OpenMP directives there, the error also disapears.

Basically I have this code:

do1:    do j = Me%WorkSize%JLB, Me%WorkSize%JUB
do2:    do i = Me%WorkSize%ILB, Me%WorkSize%IUB                        
                                  
                if (Me%VegetationTypes(Me%VegetationID(i,j))%GrowthDatabase%PlantType == NotAPlant) cycle  !<= access violation

...

The error disapears if I change the code to this:

do1:    do j = Me%WorkSize%JLB, Me%WorkSize%JUB
do2:    do i = Me%WorkSize%ILB, Me%WorkSize%IUB
               
                vegID = Me%VegetationID(i,j)
                veg   = Me%VegetationTypes(vegID)
                 
                if (veg%GrowthDatabase%PlantType == NotAPlant) cycle

...

The funny thing is that Me%VegetationTypes and Me%VegetationID are not used inside any OpenMP block, and there isn't any OpenMP block in the module where this code exists.

Here is the declaration of the variables:

type       T_Vegetation
    integer, dimension(:,:), pointer                        :: VegetationID
    type(T_VegetationType), dimension(:), pointer  :: VegetationTypes
end type T_Vegetation

Any ideas on why the access violation when compiling in 32bits and with OpenMP?

Thanks for any thoughts that you can give on this! :)

0 Kudos
24 Replies
jimdempseyatthecove
Honored Contributor III
892 Views

>> If it must exist between calls, is inside the "Me" pointer (that is always a module variable).

The "Me" pointer residing in module data area is not the only requirement for persistance, what it points to is also a requirement that it is "alive" while being (or yet to be) referenced.

>> using OpenMP in a looping where there was the need to change the value in more than one place in the same array.

Your linked list maintenance section is another place. Test for omp_in_parallel(), and if in parallel, use critical section around linked list maintenance sections.

>>I don't think that our code "changes" during execution, unless there are some problem with the OS...How can this happen?

Should a pointer get corrupted, the corrupted value could point just about anywhere, including into the code section. Unless the code section(s) is(are) write protected the write via corrupt pointer can corrupt code. The "remove all breakpoints" was a fix to a sinister problem I had that took weeks to resolve. What happend was VS IDE somehow was conditioned to set a break point (INT03 or INT21) in the code, however, the breakpoint was not listed in the debug window listing breakpoints. When the program was at breakpoint (say elsewhere), the debugger removes all breakpoint code patches. IOW, examining the code shows good code. Issuing F5 (run/continue), causes the debugger to insert breakpoints as patches to code, then resumes. Should the breakpoint address reside within the instruction byte stream, as opposed to at the first byte of the instruction byte stream, then the instruction will exectue strangely (incorrect Scale, index or base, or immediate, etc). When program crashes (GP halt in my case), the debugger breakpoint patches, including the hidden breakpoint within the instruction stream, are removed, and thus use of debugger to examine what happened shows good code. To figgure this out, I had to write code to perform byte compares of the suspected section of code in error, and maintain a copy of the bad code for further examination. Unfortunately, this exposes a "Heisenbug" (looking at the bug changes the bug). The bug moves about, after a while, I was able to determine that the corrupted data was either INT03 or INT21 (cannot recall which). This lead to a leap of deduction that the debugger may be involved. I tried the remove all breakpoints, save solution, close solution, open solution - problem solved.

You may have an elusive  "Heisenbug".

My condolences.

Jim Dempsey

0 Kudos
Jauch
Beginner
892 Views

Hi Jim

The "Me" pointer residing in module data area is not the only requirement for persistance, what it points to is also a requirement that it is "alive" while being (or yet to be) referenced.

True.

All the code in a module that deals with the linked list is in the module itself. It is called only to add an item to the list or to remove it. If a subroutine is called with a invalid ID, it will return with an error. THe Me is used only when a subroutine is called, and it will point to the ID that the caller passes to it.

It's a "pseudo" object oriented programming. Simple, but effective in maintain things organized.

Your linked list maintenance section is another place. Test for omp_in_parallel(), and if in parallel, use critical section around linked list maintenance sections.

The linked list related code shouldn't be in a parallel zone or be called by multiple threads. Like I said, only simpler loopings are inside parallel zones of OpenMP. But I'll take a look on it.

Should a pointer get corrupted, the corrupted value could point just about anywhere, including into the code section. Unless the code section(s) is(are) write protected the write via corrupt pointer can corrupt code. The "remove all breakpoints" was a fix to a sinister problem I had that took weeks to resolve. What happend was VS IDE somehow was conditioned to set a break point (INT03 or INT21) in the code, however, the breakpoint was not listed in the debug window listing breakpoints. When the program was at breakpoint (say elsewhere), the debugger removes all breakpoint code patches. IOW, examining the code shows good code. Issuing F5 (run/continue), causes the debugger to insert breakpoints as patches to code, then resumes. Should the breakpoint address reside within the instruction byte stream, as opposed to at the first byte of the instruction byte stream, then the instruction will exectue strangely (incorrect Scale, index or base, or immediate, etc). When program crashes (GP halt in my case), the debugger breakpoint patches, including the hidden breakpoint within the instruction stream, are removed, and thus use of debugger to examine what happened shows good code. To figgure this out, I had to write code to perform byte compares of the suspected section of code in error, and maintain a copy of the bad code for further examination. Unfortunately, this exposes a "Heisenbug" (looking at the bug changes the bug). The bug moves about, after a while, I was able to determine that the corrupted data was either INT03 or INT21 (cannot recall which). This lead to a leap of deduction that the debugger may be involved. I tried the remove all breakpoints, save solution, close solution, open solution - problem solved.

I didn't know that was possible... I always thought that code and data were separeted and that a pointer trying to access a region outside the data region would cause an access violation...

You may have an elusive  "Heisenbug".

My condolences.

lol
yeah.
I already catch some of these. Never showed in debug. I have to make use of other techniques to solve them.
But this one is really slippery... ;)

Thanks :)

0 Kudos
jimdempseyatthecove
Honored Contributor III
892 Views

>>The linked list related code shouldn't be in a parallel zone or be called by multiple threads. Like I said, only simpler loopings are inside parallel zones of OpenMP. But I'll take a look on it.

 Consider adding an assert

!$ if(omp_in_parallel()) write(*,*) "Bug" ! add critical section here

>>I didn't know that was possible... I always thought that code and data were separeted and that a pointer trying to access a region outside the data region would cause an access violation...

Some of the newer processors can mark pages as "Execute Only", they can also do read only but this is not always done.

Jim Dempsey

0 Kudos
Jauch
Beginner
892 Views

Hello again :)

Before doing anything else, I start to check for more "basic" errors in the code, like missing variable initialization. So I used Runtime checking (that I didn't know that existed... :S) and found some really annoying erros, many of them that could cause the error that I mentioned in the begining of this post.

Now I receive a warning:

forrtl: warning (402): fort: (1): In call to I/O Write routine, an array temporary was created for argument #1

There is some way, through some option, to find the line that caused this warning?
Problems with boundary also appear like this, whithout the line of the instruction...

And I continue to debug the code :)

0 Kudos
Reply