Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Can threadprivate custom type variables be changed from within a parallel region?

Elarion245
Beginner
1,498 Views

Hello world,

the title already states what I would like to find out. I have a global variable which is used by OpenMp and declared threadprivate.

Within the parallel loop, the code may find the need to dynamically reload data from the disk. In this case the code currently

extends the data in in allocatable array of the custom type.

This seems to have been working for some time, however now after the parallel region I added some code for extra analysis and

suddenly I get access violations from time to time (not every run) and I think that this might be due to growing the real array within the custom type.

 

Even if this is non-standard (and I should probably code something around it), I am still wondering why this would be a violation, as the results from the parallel loop look as excepted and once the omp parallel do is finished, I would have thought that memory that belongs to a hibernating thread should be freed?

Thanks for any help you can provide!

0 Kudos
30 Replies
Elarion245
Beginner
373 Views

Thanks for answering 3). Regarding 4): The runtime checks also showed no corruption for me. In fact I even fired up Intel inspector and the most expensive threading check showed no problem for a run where the program went through. In a second attempt it just crashed and then inspector reported corrupt data.

The inspector memory check is running since 90 hours, but has not shown any progress anymore since one day (it seems stuck at the end of the parallel loop). So I guess that the runtime checks along are no guarantee that no memory corruption has happened.

Or asked vice versa: do you think that growing the threadprivate variable is defined behavior? In the end I guess if so, there should have been no memory corruption and independently if the master thread increased the array or not, the indexing would still be valid, as it is via the integer day.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
373 Views

I think at this time we are spinning our wheels. A look at the compete code may be in order.

All solutions are obvious once you see the solution.

Jim

0 Kudos
Elarion245
Beginner
373 Views

It seems so... Great quote by the way. Do you know the original author?

If it wasn't a company confidential code I would be the first to share for the learning purpose, but

I am obviously not allowed to do so. The entire projects has a couple hundred thousands lines of code, and I guess your great effort of coding the examples is all that can be done without the entire project.

From my side I take away that changing the array dimensions of the threadprivate variable from within the array might work, but if one encounters sudden strange behavior this would be the first place to look. At least I could only use it up to a certain point. In any case this is certainly not good style, but I am also a bit releaved to see that also the coding experts here cannot directly state: yes/no, but need to test it :)

Once again thanks a lot for the discussion!

0 Kudos
jimdempseyatthecove
Honored Contributor III
373 Views

Have you looked at your code to see of you do not have a global variable, something like:

integer :: nEOPs

Where nEOPs is the initial size of the master thread's pre-parallel region array EOP and inside the parallel region, one of the threads needs to expand its threadprivate EOP array. It may even update nEOPs. Then on exit of the parallel region through the code that loops back to the beginning of the loop containing the parallel region, that the master thread does not notice the size change. Upon reentry, and copyin, all threads EOP arrays get set (if necessary) to the smaller size of EOP of the main thread, but then use the updated nEOPs  for accessing their arrays.

I cannot say why the Debug runtime check for accessing an array out of bounds would not catch this.

You may have some statements that require the array size to be correct. For example having EOP being passed to a subroutine as

subroutine foo(EOP)
       use yourModuleHere
       Type(tEOP), dimension(1:nEOPs)             :: EOP

IOW the reference to the threadprivate tp%EOP  is passed on the CALL, however the (now incorrect) size of the array is assumed to be in the global nEOPs.

Some thing like that could result in the problem you observe.

Jim Dempsey   

0 Kudos
Elarion245
Beginner
373 Views

Hey Jim. There is no variable tracking the size of the array. The array is allocated using the modified julian date of the data. Hence it has bounds like 58000:58100 or so. When the array is grown, then the bounds change to something like 57800:58300. Hence, when data for a day, say, 58000 is requested, the index is valid in both cases and I never have an out of bounds.

The reloading is only performed if the requested day is prior to the covered data and then the threadprivate array dimensions were extended.

0 Kudos
jimdempseyatthecove
Honored Contributor III
373 Views

Ahhh, LBOUND and UBOUND changing as well as size. I overlooked this aspect from earlier discussions.

Is this the sketch

program
  read in 101 Julian date records (into main thread's) array dimensioned as (58000:58100)

do
  ... code A
  parallel region copyin(main's threadprivate array)
     ... code B
     ! some thread needs prior data
     if(need more) then
       save my threadprivate array into temp
       deallocate my threadprivate array
       allocate my threadprivate array dimension(57800:58300)
       read in records for 57800:57999
       copy in 58000:58100 from temp
       delete temp
     end if
     ... code C
  end parallel
  ... code D
end do

Now you want the new prior records, after the parallel region available to the main thread (and thus be available for the next iteration of the outer loop)

Is that what you expect?

Jim Dempsey

0 Kudos
Elarion245
Beginner
373 Views

Hi Jim. Yes, Lbound and Ubound change as well, but your sketch is not capturing it fully.

It is more like this:

program

use global variable from other module

initialize and allocate global variable
!-> should this already happen in a parallel region??

... code A

OMP parallel do 
copyin(main's threadprivate global variable containing the EOP data in the private array)

   ... Calculations requiring EOP data

   ! some thread needs prior data

   if(current timestamp not in EOP array) then
   
	!This is what the code looked like before
	Read in an extended range of data into a local array of type tEOP 
	
	!The read-in array is larger than the original one, as the bounds are now
	!say 57800:58300
	Overwrite the array component of the threadprivate global variable with the
	array created from the file 
	!yes, also the data that was available has been read in again, not just the missing data
   
	work with the enlarged array

omp end parallel do

... postprocessing

Note I never intended to use the data which has been reloaded from file also in another thread or after the parallel region. These out of bound times that require reloading are very rare. Hence, if any other part of the code (sequential after the omp parallel do or any other thread within omp parallel do) encounters the need to reload data, it is simply reloaded again.

Consequently I basically do not care if the reloaded data persists after the parallel region or not. Since the indexing works via the day and arrays can start and stop at any index (which is a great feature of fortran!!) I will always get the EOP data I need. Either it is already in the array component of the global variable or reloaded from disk.

The only issue was that in the postprocessing step I started to encounter access violations from time to time whenever a reload took place (by setting breakpoints in release mode with debugging I was able to correlate the access violations with those iterations where a reload of data from file took place)

0 Kudos
jimdempseyatthecove
Honored Contributor III
373 Views

You are aware that copyin only occurs on entry into the parallel region and not on each DO iteration.

From the above description

1) main thread reads in an anticipated range of dates
2) parallel DO loop copyin(anticipated range of dates)
   3) do work on local threads EOP data (assuming date data available)
   4) thread determines EOP data required is not available
   5) new LBOUND or UBOUND is determined
   6) EOP array for this thread is reallocated to new bounds (old data discarded)
   7) file EOP data file read for records from LBOUND to UBOUND
8) end parallel DO
9) post processing not referencing any of the EOP data

IOW

all threads using same EOP data file, starting with the same initial window into the EOP data file (copied in from master thread).
during parallel DO loop, each thread may expand this window as opposed to moving the window.

Note READ will not reallocate an array to fit the size required for data read (I assume you know this)
IIF indexing is now out of bounds, and coding visually looks good, then this is indicative of the code in the parallel region is using a shared variable (may be in the module) as opposed to assumed private variable.
.OR. what visually looks good is actually in error (uninitialized, initialized but not reset, etc).

I suggest you insert some debug code that explicitly checks the bounds prior to access (i.e. do not rely on runtime checks to catch the error)

index = ...
call DebugEOP(index)
... code that uses tc%EOP(index)
index = ... ! new index
call DebugEOP(index)
... code that uses tc%EOP(index)



subroutine DebugEOP(index)
  use moduleGloabal
  implicit none
  integer :: index
  if( (index < lbound(tc%EOP)) .or. (index > ubound(tc%EOP))) then
     print *,"bug" ! place break point here
  end if
end subroutine DebugEOP 

Note, the subroutine DebugEOP can be in a seperate source file, compiled as Debug (full debug symbols no optimizations)
while the main code can be compiled and linked as Release build, but also with full debug symbols.

You may find it handy to compile using FPP (Fortran PreProcessor) which then provides for the FPP macro __LINE__

index = ...
call DebugEOP(__LINE__,index) ! compile with FPP
... code that uses tc%EOP(index)
index = ... ! new index
call DebugEOP(__LINE__,index)
... code that uses tc%EOP(index)



subroutine DebugEOP(line, index)
  use moduleGloabal
  implicit none
  integer :: line, index
  if( (index < lbound(tc%EOP)) .or. (index > ubound(tc%EOP))) then
     print *,"bug", line, index ! place break point here
  end if
end subroutine DebugEOP 

Jim Demspey

0 Kudos
jimdempseyatthecove
Honored Contributor III
373 Views

Also, after you exit the parallel region (your sketch #28) do you loop back up to code preceding the parallel region #28...
... then reenter the parallel region, with copyin of master threads old/new range of EOP records
... .AND. assume your threads prior bounds or remembered work point indexes (held in threadprivate variables) are that use for current iteration

Then these remembered values are ONLY good should the master threads range of EOP records include all threads remembered index values.

This can be corrected by examination of the LBOUND and UBOUND at entry to the parallel regon.

Jim Dempsey

 

0 Kudos
Elarion245
Beginner
373 Views

Hi Jim,

I take care of the allocation bounds as part of the reloading routine. There is no out of bound memory access within the loop.

Also the results of the final computations match those of a commercial tool, which would not be the case if random data was accessed.

In the overall code I do not explicitly go back to the same loop again, however it can happen that a recursive call of the function is made, so that the loop is accessed a second time. I assume that since the resursive call is made after the first call has taken place, that the value which is then stored within the master thread is copied in once, which is fine for me. Again, if data is not available, is is dynamically reloaded.

What might be confusing:

the writing back of the data into the global variable was coded by some one else and was never intended to actually take place from within parallel context. I solved my problems simply by using the read-in data, but not writing it back to the threadprivate variable.

The only bad thing here is that the data may be part of a numerical integration function, so it slows down performance a lot, if every step requires reading in the data again, because I cannot modify the threadprivate variable and hence avoid multiple reloading. Thus I extended the data range wich is originally read in. For the time being this works.

I posted the question out of curiosity, if changing the data from within the tread is generally a valid operation.

0 Kudos
Reply