debugger can't see the threadprivate variables

JO__Masatoshi · ‎07-18-2018

Hello,

I have a problem with porting my serial fortran (parallel studio XE 2018 update3, on Win7) program to OpenMP.

Problem: the Visual Studio debugger cannot see the threadprivate variables.

The original program is a serial one which I have long been using.

program MyProg
start-up codes

outermost loop
calculation depending on the loop index is here
many calls are included
end loop

closing codes
end MyProg

I started an adaptation of this to a parallel one, by encapsulating all the codes within the outermost loop in MyProg into a subroutine MyProg_core.
The specific behavior of MyProg_core is solely determined by the i-th condition.
All the related routines are in the dynamic extents.

The parallel version becomes:
program MyProg_main

use mod_global_vars

other ordinary variables var1, var,2,..

some serial codes

!$OMP parallel shared(all vars in mod_global_vars)
!$OMP do schedule(static,1)
do i=1, num_jobs
call MyProg_core(i, var1, var2,..) ! all calls are in the dynamic extent
end do

!$OMP end do
!$OMP end parallel

end MyProg_main

All common and read-only variables shared by the threads are gathered in the module below:
module GlobalVars
integer*4 G_var1, G_var2,.., ...
real*8, allocatable::G_array1(:), G_array2(:,:), ..
...
end module

All private variables that can be modified in each thread are in the following module, together with the threadprivate list:

module PrivateVars
real*8, allocatable,save:: P_var1, P_var2,...
integer*4, save:: P_array1(:), P_array(:,:),...
...
!$OMP threadprivate(list of all vars in this module)
end module

"use PrivateVars" is placed only in MyProg_core and deeper routines inside using the corresponding variables, i.e., in the dynamic extent only.

The PROBLEM is:
When this code is run in the visual studio debugger, all the threadprivate variables are not visible but displaying "undefined address", in the main window or parallel watch window. Shared variables are on the other hand always visible.
The invisible variables, however, can be printed on the screen using "write" statement. Also, when an invisible threadprivate P_var is copied to ordinary var as var=P_var, var shows the value of P_var in the debugger. The calculation runs normally and outputs the final results. Therefore, only the debugger fails to see their values, which are correctly kept in each thread during the execution.

Can you suggest the possible reason for this phenomena?

regards,

Masatoshi

LeonardB · ‎07-20-2018

Hi,

If I understand it right is the code doing something like:

module testmod
    implicit none
    integer,parameter:: n=5
    integer,allocatable,save:: iprivate(:)
    !$omp threadprivate (iprivate)
    contains
    
    subroutine initiate_iprivate()
        allocate(iprivate(n))
        iprivate(:)=-1
    end subroutine
    
    subroutine sub(ii)
        integer,intent(in)::ii
        
        iprivate(1:n)=ii

        continue

    end subroutine sub
    
end module
    
program test
    use testmod
    use omp_lib
    implicit none
    integer:: i,ii
    
    call initiate_iprivate()
    
    call OMP_SET_NUM_THREADS(2)
    !$omp parallel do private(ii)
        do i=1,10
            ii=i
            call sub(ii)
            write(*,*) i
        enddo
    !$omp end parallel do
end

This code gives different kinds of errors as undefined allocation or allocated to strange parts of memory space (IVF 18 + VS 2013) causing VS to crash.

Re-wirting it to a more "pure" form solves the problem:

module testmod
    implicit none
    integer,parameter:: n=5
    
    contains
    
    pure subroutine sub(ii,initiated_iprivate)
        integer,intent(in)::ii,initiated_iprivate(n)
        !dir$ define alt_alloc
        !dir$ if defined(alt_alloc)
            integer,allocatable:: iprivate(:) ! use allocaded memory if n is large
            allocate(iprivate(n))
        !dir$ else
            integer:: iprivate(n)  ! puts the vector in the stack
        !dir$ endif
            
        iprivate(:)=initiated_iprivate(:)  ! copy the common initiation values to the threadprivate work space
        
        iprivate(1:n)=ii  ! do the work

        continue

        !dir$ if defined(alt_alloc)
            deallocate(iprivate)
        !dir$endif
            
    end subroutine sub
    
    end
    program test
        use testmod
    use omp_lib
    implicit none
    integer:: i,ii
    integer,allocatable:: initiated_iprivate(:)
    
    allocate(initiated_iprivate(n))
    initiated_iprivate(:)=-1
    
    call OMP_SET_NUM_THREADS(2)
    !$omp parallel do private(ii)
        do i=1,10
            ii=i
            call sub(ii,initiated_iprivate)
            write(*,*) i
        enddo
    !$omp end parallel do
end

BR/Leonard

jimdempseyatthecove · ‎07-20-2018

LeonardB,

Post #2, first example is in error. The call to init_threadprivate is (was) made from the serial region (also prior to thread pool creation), thus initializing only the threadprivate array of the master thread. To correct for this, enclose the call to init_threadprivate within a (the first) parallel region (being careful to assure thread count is what you desire).

Also, from the IVF documentation:

On entry to the first parallel region, data in the THREADPRIVATE common blocks or variables should be assumed to be undefined unless a COPYIN clause is specified in the PARALLEL directive.

However, the above statement is not definitive as to what happens with allocatable arrays declared threadprivate. IOW does the array descriptor get copied, thus producing separate array descriptors with same content (pointing to the same pre-allocated data and thus producing heap corruption on first deallocate/reallocate). A copyin'd empty array descriptor is safe.

Jim Dempsey

JO__Masatoshi · ‎07-23-2018

Thank you, LeonardB and Jim,

Though I have not understood your comments completely, May I make a further explanation about the initialization of variables?

At the beginning of MyProg_core (inside the parallel loop, in the dynamic extent), both shared and threadprivate arrays are initialized as follows.

***codes
call AllocSharedArrays ! for shared arrays

allocate(P_array1(..), .. ) ! for threadprivate arrays
*** end codes

AllocSharedArrays allocates shared array only once when the first thread arrives at this point. It checks whether these are already allocated or not as follows, IF( ALLOCATED(G_array1)) return
Because all shared arrays are simultaneously allocated or deallocated, only the first array in the argument list is checked.
For threadprivate arrays, they are deallocted at the exit of MyProg_core and allocated again at the next entry.

After these allocations, initial data are loaded.

*** codes
!$OMP critical
OPEN(5,FILE=INFULLNAME,ACCESS='DIRECT',FORM='UNFORMATTED',
$ RECL=8)
CALL InputParallel ! get shared values
CLOSE(5)
!$OMP end critical
*** end codes

This call sets the shared variables from the input file. Therefore, calls in the second or later threads are redundant. However, I leave it for simplicity. The time for overwriting the shared variables in each thread is negligible. I think this has no side effects.

After calling InputParallel, these two routines are called to prepare the threadprivate data.

*** codes
call CopySharedVarsToPrivate
call ResetInternalVars
*** end codes

There are two kinds of threadprivate variables. (1)those read from the input file as above and to be modified as the calculation proceeds, and (2)those generated and used only internally in MyProg_core.
CopySharedVarsToPrivate copies the shared to the threadprivate of former type as their initial values,
var_threadprivate=var_shared

The latter ones are reset in ResetInternalVars to 0 or "", etc.
var_threadprivate=0,
in order to avoid unexpected errors.

Thus, all the threadprivate variables are allocated and initialized at the beginning of each thread, I believe...

As I mentioned in the first post, the program seems to output correct results, which looks like the ones which had been given by the serial codes.

Masatoshi,

jimdempseyatthecove · ‎07-23-2018

Please show the code to the extent of when/where you enter the parallel region .AND. how you protect against race conditions. You may elide code not require to illustrate correctness.

For example:

you>> AllocSharedArrays allocates shared array only once when the first thread arrives at this point. It checks whether these are already allocated or not as follows, IF( ALLOCATED(G_array1)) return

Potentially has an error, due to code not seen, it is unknown if race conditions are protected. The proper coding to protect race condition would look like:

IF( ALLOCATED(G_array1)) return
!$OMP CRITICAL
IF(.NOT. ALLOCATED(G_array1)) then
  ALLOCATE(G_array1(something))
ENDIF
!$OMP END CRITICAL
return

You may have additional issues if the allocation of the shared arrays additionally require initialization. If this is the case then consider adding a LOGICAL variable. Something like:

LOGICAL, SAVE :: G_arrays_ready = .FALSE.
...
IF( G_arrays_ready) return
!$OMP CRITICAL
IF(.NOT. G_arrays_ready) then
  ALLOCATE(G_array1(something))
  call InitGarrays
  G_arrays_ready = .TRUE.
ENDIF
!$OMP END CRITICAL
return

And don't forget similar code to reset G_arrays_ready. Also, assure that you do not enter the deallocation section prior to all threads finishing use of the shared data.

Jim Dempsey

JO__Masatoshi · ‎07-27-2018

I have prepared a small OpenMP console code files that reproduce the symptom, by shrinking the original project. My system is Win7 x64 with 16 cores. Fortran is Parallel Studio XE 2018 Update 3 Composer Edition for Win.

10 files are included as follows.

1) test_main.FOR ( program test_main): This sets and reports the number of threads, and then construct the parallel region. The number of jobs is 16. Basic constants used in array allocation is set by calling GetDataSizes.

2) test_core.for (subroutine test_core): This is the whole body of parallel processing. It displays the loop count and its own thread ID on the screen when it is called. However, the debugger cannot display thread ID because of 'undefined address.' This routine also does the followings, retrieving the data used in the calculation, array allocation and deallocation, data copy from shared to threadprivate variables, two types of data assignments.

3) mod_GlobalVars.for (module GlobalVars): Contains the global variables declared as SHARED in the test_main.

4) Mod_Vars.for (module VarDecs): Contains thread variables with !$OMP threadprivate list.

5) Input_parallel.for (subroutine InputParallel): Mimics File I/O.

6) GetDataSizes.for (subroutine GetDataSizes); Mimics File I/O of array size constants.

7) AllocSharedArrays.for (subroutine AllocSharedArrays): For 1-time allocation of shared arrays.

8) DeallocSharedArrays.for (subroutine DeallocSharedArrays): Deallocation of shared arrays.

9) DeallocPrivateArrays.for (subroutine DeallocPrivateArrays): For deallocation of threadprivate arrays. The corresponding allocation is inlined in sub test_core.

10) CopySharedToPrivate.for (subroutine CopySharedVarsToPrivate): Copy the shared values to corresponding threadprivate ones.

The reproduced symptom is:

All the threadprivate variables are not visible because of 'undefined address' on the debugger. However, except for this, the calculation seems to go well. The invisible variables can be printed on the screen and can be used in the loop control without errors. Assignment operation also works successfully. When an invisible thread private variable is copied to a local variable, it shows the correct value. These (and full-size project's result) suggest that the invisible values are correctly processed internally.

I hope you will quickly point out my mistake(s).

Masatoshi

jimdempseyatthecove · ‎07-27-2018

Which version of Microsoft Visual Studio are you using?

The debugger is that supplied by Microsoft (with extensions provided by Intel).

I've experienced similar problems in older versions of IVF and MS VS. The "hack" solution I used in Debug Build was to declare a pointer to the item in conditional code at the top of the subroutine or function. Then use the pointer(s) in the Debugger to see the variable(s) of interest. With current versions of IVF you can use the ASSOCIATE construct of the language.

Jim Dempsey

JO__Masatoshi · ‎07-29-2018

Jim,

VS is that came with IVF as shown below. Other languages are not installed.

Here are the copies from VS about dialog.

Microsoft Visual Studio 2013 Shell (Integrated)
Version 12.0.21005.1 REL
Microsoft .NET Framework
Version 4.7.03062

installed version:IDE Standard

Intel® Advisor 2018 Update 3
Intel® Advisor 2018 Update 3, (build 558307), Copyright © 2009-2018 Intel Corporation. All rights reserved.

Intel® Inspector 2018 Update 3
Intel® Inspector 2018 Update 3, (build 558189), Copyright © 2009-2018 Intel Corporation. All rights reserved.

Intel® VTune™ Amplifier 2018 Update 3
Intel® VTune™ Amplifier 2018 Update 3, (build 559005), Copyright © 2009-2018 Intel Corporation. All rights reserved.

intel (R) Parallel Studio XE 2018 Update 3 Composer Edition for Fortran Windows* package ID: w_comp_lib_2018.3.210
intel (R) Parallel Studio XE 2018 Update 3 Composer Edition for Fortran Windows* Microsoft* Visual Studio* 2013 integrated

Thanks,

Masatoshi

JO__Masatoshi · ‎08-06-2018

Jim, 1 week has passed since my last post, in response to your question about the version of VS. According to your comment #7, you experienced similar problem in the older versions of VS. It also implied that your latest VS works well. Therefore, please let me know the versions of VS you were once troubled with and that does show the threadprivate variables successfully, together with the build options, if you kindly already have had examined my code. I don't think that to verify whether my project runs or fails on your debugger is a very difficult task. Your advice about 'hack' -ing the debugger by embedding special codes is interesting, though I have not understood how to do it. However, this is the last option I would like to employ, since I want to concentrate myself on my research work. It is evident that the most powerful and attractive selling point of IVF integration into VS is its easiness-to-use and comfortable development environment, free from the special techniques that only limited number of experts can exploit. Without such wonderful capabilities, I would be forced to work like the way I had to do many years ago with MS fortran on MS DOS.

I am looking forward to seeing my mistakes spotted.

Thanks in advance,

Masatoshi

jimdempseyatthecove · ‎08-06-2018

I use MS VS 2013 (2012 with update to 2013). This version has issues with threadprivate variables in IVF. From my understanding the issue with threadprivate variable within the debugger is the resolution of the address of the variable requires execution of code

getBaseOfThreadPrivateArea() + offsetOfVariableOfInterest

When specifying a variable to inspect, the debugger is given only the offsetOfVariableOfInterest (the symbol known in the debug information within the execuitable). The method getBaseOfThreadPrivateArea() is not known to the MS Debugger, and it may vary depending on the tools used to produce the execuitable.

'hack'-ng...

In the old days, Borland Turbo Debugger, one could specify for inspection and expression "anArbitraryUserFunction(&someVariable)". I do not know if (think) MS VS Debugger permits this. IOW the expression could include function calls to your code as well as values.

In my IVF code with threadprivate variables, I found it is easiest to create a single user defined type to contain all the thread private data. Then at the top of the subroutine, insert a pointer to this object.

! moduleWith_context
type threadPrivate_t
  ... ! your stuff
end type threadPrivate_t
type(threadPrivate_t), pointer :: context => NULL()
!$omp threadprivate(context)
...
! near top of program first parallel region
!$omp parallel
!once only initialization
allocate(context)
... ! initialization code if any
!$omp end parallel
...
subroutine foo
  use moduleWith_context
  ...
  type(threadPrivate_t), pointer :: my_context
  ..
  my_context => context ! for debugging
  ...
  var = context%x ! Examine my_context%X in debugger

Jim Dempsey

JO__Masatoshi · ‎08-10-2018

Jim, I'm trying your advice now.

Additional question. You said :

"The method getBaseOfThreadPrivateArea() is not known to the MS Debugger, and it may vary depending on the tools used to produce the executable."

Do you mean that getBaseOfThreadPrivateArea() is to be written in Intel, but no yet? Do you know someone (in Intel) who knows what tools are necessary or to be created, for the production of the desired executable?

Masatoshi

jimdempseyatthecove · ‎08-10-2018

>>Do you mean...

No. What I mean is, when you enter the symbolic name into the debugger, the debugger performs a symbol table lookup to produce an address. In this case the address isn't an address, but rather an offset into the threads thread private area, somewhat like a member variable offset. However, the base of the "object" (thread private area) is derivable via an application specific (generally inline) function. The debugger knows not what this function is (with possible exception to MS compiler and MS debugger). On Windows x64, MS software uses the GS segment register (selector). Other vendors on Windows x64 may adopt this or may not. Also the MS Debugger runs in a different process space, and thus (may)has a different mapping for GS than the selected thread of the application being debugged. As to if the MS debugger cah work around this, you will have to consult with MS. When the selector GS is used, the instruction encoding of the memory reference becomes

mov rax, qword ptr GS:[variableOffset]

Where the debugger is likely using

mov rax, qword ptr [variableOffset] ; no GS:, which implicitly means DS: (Data Selector)

Jim Dempsey

JO__Masatoshi · ‎08-20-2018

Jim, because your advice #10 was a little bit not understandable for me ( pointer of pointer appeared), I modified the source and found it works well.

Let me summarize my understandings as of today. Please make comments if some of them are not correct.

module ThreadPrivateVars

type TPVars

list of threadprivate vars

end type

type(TPVars),save,target:: MyVars
type(TPVars), pointer::MyVarPointer=>NULL()

!$OMP threadprivate(MyVars)

end module

in the routine,

use ThreadPrivateVars

...

!$OMP critical
MyVarPointers=>NULL()
MyVarPointers=>MyVars
!$OMP end critical

...

As you said, while threadprivate MyVars is not visible (undefined address), MyVarPointers displays the contents of MyVars successfully. It was necessary to reset the pointer every time before use (arrays get undefined occasionally), which looks somewhat strange because the array sizes (shape of type) do not change during the execution.

MyVarPointers also becomes invisible when it is declared as threadprivate. Therefore, it seems for me, as long as for the purpose of monitoring the invisible quantities, that the use of pointers is equivalent to the use of shared variables, except the heavy overhead of data transfer in the latter case. (Of course, such overhead should never be allowed in my actual program.)

regards,

Masatoshi JO

jimdempseyatthecove · ‎08-21-2018

One feature of using threadprivate is to eliminate the overhead of copying in private data, another is to have global thread context (so as to not pass a private context pointer/reference on CALL or function.

Jim Dempsey