How to control serially in openmp parallel section

Vivid · ‎08-29-2023

I wrote parallel code through openmp in fortran. A number of dynamically allocated arrays are included. In order to prevent data infringement between threads, it is allocated at the start branch within the parallel statement and freed before parallel termination.

It seems that memory invasion occurs when using the existing allocation method. Assuming 4 threads, the problem seems to occur because thread 3 accesses the memory area allocated by thread 0 at the same time. I would be very grateful if you could tell me how to solve it.

TobiasK · ‎08-29-2023

hi @Vivid

would be great if you could share an example of what you did exactly. Otherwise it's just guessing around.

Best

Vivid · ‎08-30-2023

I add a question below.

Thanks for your attention.

Vivid

jimdempseyatthecove · ‎08-29-2023

...
if(.not. allocated(array)) then
  !$omp critical(array_critical)
  if(.not. allocated(array)) allocate(array(NN))
  !$omp end critical(array_critical)
endif
do ...
end do
!$omp barrier
if(allocated(array)) then
  !$omp critical(array_critical)
  if(allocated(array)) deallocate(array)
  !$omp end critical(array_critical)
endif

Jim Dempsey

Vivid · ‎08-29-2023

The answer helped me a lot. Thank you.

Here is my condensed code.

Module A has the following variables and some functions. I declared G3DA, G3DB, G3DC, and G3DD as threadprivate. I want to allocate them dynamically so that each thread does not affect each other.

For reasons of data invasion, allocation outside of the omp statement should not be copied using threadprivate or copyin. I must allocate within the omp statement so that all threads can dynamically allocate.

The code is as follows.

G3DA	q	()	V3		a	()
					b	()
					c	()
					d	()
					e	()
					na
					ka
			E2		a	()
					b	()
					c	()
					d	()
					na
					ka
			T3		a	()
					b	()
					c	()
					d	()
					e	()
					f	()
					g	()
					h	()
					i	()
					na
					nb
			T4		a	()
					b	()
					na
					nb
			K3		a	()
					b	()
					na
					nb
	List		a	()
			b	()
			c	()
			d	()
			e	()
			f	()
			g	()
			h
			na
			nb
	nna
	nnb
	nnc
	nnd
	nne
	w	()
	e	()
	r	()
	t	()
G3DB
G3DC
G3DD

subroutine A

  DO I=1, 10
    !$omp parallel
    !$omp barrier
    !$omp critial(array_critical)
      call testForAllocation()
    !$omp end critial(array_critical)
    !$omp barrier
    !$omp do
    Do J=1, 10
      .....
    END DO
    !$omp end do
    !$omp barrier
    !$omp critial(deallocate_critical)
      call testForDeallocation()
    !$omp end critial(deallocate_critical)
    !$omp end parallel
  END DO
End subroutine A

Subroutine testForAllocation

  call memRealloc(model_old, n_new)

End subroutine testForAllocation

It goes through more subroutines and has parameters, but to roughly explain it, it goes like this. Dynamic allocation does not exist where omp parallel starts. Thanks to the use of the critical and barrier directives, I confirmed that a total of four threads 0 to 3 were running as I wanted before omp do started. To confirm, I only looked at the code, excluding the dynamic allocation code.
I'm still having issues with:
Even though Thread 3 enters after Thread 0 exits the allocation gracefully, I get "Error Message: Heap is corrupted". The same problem does not always occur, so as soon as it starts, thread 0 allocates various things and then stops.
I think I need to manage it at the memory level. What should I do?

Even if thread 0 used memory from 10 to 100, thread 1 only seems to know the situation before entering omp. If I need directives such as omp_init lock, please reply with detailed examples.

Thank you so much for your help.

Vivid

Vivid · ‎08-29-2023

While looking for related documents, I came across the openmp.org site and I am also curious to see if it is related to this part. There are no related examples for this part, so I am even more unsure how to apply it.
Memory Allocators
sync_hint: contended, uncontended, serialized, private

https://www.openmp.org/spec-html/5.2/openmpse35.html#x114-1190006.2

18.9 Lock Routine

https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

I just want to know whether dynamically allocated memory control by controlling serial and parallel with omp critial, omp barrier, etc. is sufficient.

I tried using the lock like this, but it didn't work.

integer (kind=omp_lock_kind) svar

call omp_init_nest_lock(svar)

!$omp parallel default(private) shared(svar)

!$omp barrier

!$omp critical(initial_alloc)

call omp_set_nest_lock(svar)

...allocation...

call omp_unset_nest_lock(svar)

!$omp end critical(initial_alloc)

!$omp barrier

!$omp do

......

!$omd end do

!$barrier

...deallocation...

!$omp end parallel

Vivid

jimdempseyatthecove · ‎08-29-2023

Note, *** depending on the internals of the sequence of operations in the allocate(array(NN)), the test for allocated might return .true. before the array descriptor is fully defined. This would be corrected by placing an !$omp barrier before the do ...

Jim Dempsey

TobiasK · ‎08-30-2023

@Vivid sorry I still do not fully understand the question.

Do you want a thread private allocatable array? Or do you want a shared array which is just allocated inside the parallel region by a single thread what @jimdempseyatthecove code does?
For a thread private allocatable there is no need for a critical section or a barrier. Each thread has it's own copy of the array which is only accessible by the thread itself, hence also allocate can be called by all threads at the same time.

If you want a shared array that is allocated just inside the parallel region, I would not use critical but instead a single construct since it can be allocated only once for all threads anyway.

program thread_private_alloc
  use omp_lib
  integer,allocatable :: array(:)

  integer :: i, thread_id,max_threads,error

  max_threads=omp_get_max_threads()

  error=0
  !$omp parallel private(array,i,thread_id) reduction(+:error)                                                                                                                                                                                                                                                                                                                                
  allocate(array(max_threads))
  !$omp barrier                                                                                                                                                                                                                                                                                                                                                                               
  array=0
  thread_id=omp_get_thread_num()+1
  array(thread_id)=thread_id
  do i=1,max_threads
     if(i.eq.thread_id)then
        if(array(i).ne.thread_id) then
           error=error+1
           write(*,*) 'warning corruption'
        end if
     else
        if(array(i).ne.0)then
           error=error+1
           write(*,*) 'warning corruption'
        end if
     end if
  end do
  deallocate(array)
  !$omp end parallel                                                                                                                                                                                                                                                                                                                                                                          
  if (error.eq.0) write(*,*) 'no corruption occured'

end program thread_private_alloc

Note the barrier is not needed, I just added it to demonstrate that all threads may access their own private copy of array at the same time

Vivid · ‎08-30-2023

I want a module that is private between threads.

It must have an independent memory where the values of threads 1, 2, and 3 are not affected even if the value of thread 0 changes.

In the case of static variables or arrays such as A(10), I can have independent memory between threads just by declaring threadprivate. However, dynamically allocated variables of pointer type are not possible using threadprivate alone.

There is a dynamically allocated array in the module, and I confirmed that if the value in thread 0 is changed to 100 because it shares the address value, the value in thread 2 is also randomly changed. First of all, it cannot be declared as allocatable. It is declared in the pointer method, allocation is determined by associate, and memory is allocated by allocate and deallocate.

It is declared as a pointer type, so I need to check whether this problem occurs. This is because things get very complicated if I have to modify the program so that it can be declared as allocatable.

So I confirmed that I need to allocate and deallocate within a parallel statement like the method you suggested to have independent memory. However, because there are many dynamically allocated arrays in the module, the program stops even if directives such as barrier, critical, and nest_lock are used.

This allocate method takes parameters, allocates them, copies the data, and then releases the existing array. Perhaps because of the numerous arrays in the module, allocation continues but stops at some arrays. The halting arrangement is not constant and changes every time it is run. In test(array, nsize), the local variable real*4, pointer :: stak(:) is interrupted while allocating(stack, nsize).

I understand that in openmp parallel, one large heap memory is divided and used between threads. I would like to know how memory information is shared when thread 1 allocates after thread 0 has finished allocating memory in a continuous memory space. From what I've tested so far, it seems like I know the memory area as of before entering openmp. I would like to know whether it is possible to allocate a spot in that memory, and then how to allocate it.

TobiasK · ‎08-31-2023

Hi @Vivid
there should not be a restriction on private allocations. So either you hit a compiler bug here or something else is wrong in your code.
Can you please try to provide some smaller working reproducer so that we try to figure out what is going wrong?

You may also try to compile with "-check all -g -O0 -traceback" for Ifort or "-check all,nouninit -g -O0 -traceback" for IFX and see if you get a runtime error which is more meaningful.

jimdempseyatthecove · ‎08-31-2023

@TobiasK provided you with the solution you desired (after you mentioned subsequent to your first post) that the allocatables were threadprivate.

The sketch code I provided, was for the case of a shared array to be allocated within a parallel region. The information you provided in post 1 was insufficent to assume otherwise.

>>I want a module that is private between threads

Then attribute everything in the module as thread private..

.OR.

Have the module create a User Defined Type containing what is to be thread private. And then create an instance of that type attributed with thread private.

module mod_tp
type type_tp
  ...
  real(8), allocatable :: G3DB(:),G3DC(:),G3DD(:)
end type type_tp
!$omp threaprivate(tp)
type(type_tp) :: tp
end module mod_tp
...
subroutine foo(...)
  use mod_tp
  ...
  if(allocated(tp%G3DB)) then
    if(size(tp%G3DB) < WhatYouNeed) deallocate(tp%G3DB)
  endif
  if(.not. allocated(tp%G3DB)) allocate(tp%G3DB(WhatYouNeed))
  ...

NOTES

1) if you call that routine from outside of a parallel region it is equivalent to calling it from thread 0 of a parallel region (that is not a nested region).

2) If you need for threads to access other threads private data (say for reduction), attribute tp with target, create an array of pointers to (type_tp) of size of the number of threads in the thread team, have each thread associate the pointer, then perform the reduction.

>> local variable real*4, pointer :: stak(:) is interrupted while allocating(stack, nsize).

Why are you using pointer on a local allocatable array? Pointer is unnecessary, as well as makes the code open for memory leak.

Local Arrays (without pointer), in a procedure, will automatically be deallocated upon exit of the procedure (should the code fail to do so). Pointers, on the other hand, will not. The Fortran ALLOCATE is thread-safe.

Jim Dempsey

Vivid · ‎09-01-2023

 SUBROUTINE MBC_CONTACT()
    use omp_lib
    use mod_tp
    IMPLICIT REAL*8 (A-H, O-Z), INTEGER*4 (I-N)

     allocate (GA%IN_test(10)) 

     call OMP_SET_NUM_THREADS (4)
     !$omp parallel default(PRIVATE)
!!!Q1     if(allocated(GA%IN_test)) deallocate(GA%IN_test) !!!!!!Q1.
     !$omp end parallel
END SUBROUTINE

Q1. imposible -> I guess that all threads think GA%IN_TEST is allocated and attempt to free it. This is because there is only information that it was allocated before entering the omp statement, and even if one thread releases it, it is not shared with other threads.

How should I handle it? I don't know if it can't be deallocated or if it considers unallocated things as allocated. The following error occurs:

Unhandled exception (in ompIntel.exe) at ~~~239A (ntdll.dll). 0c00~~: Heap is corrupted (parameter: ~~~8681A). This same freeing and allocation problem seems to occur in my project.

Q2. % or . ?

When accessing the module, is it correct to access it with GA%IN_test(10)? Wouldn't GA.IN_test(10) work? The project is all accessing data through . instead of %.

Q3. pointer

All the coding of the currently handed over project is used pointer type instead of allocatable. These pointer types are deallocated if associated is true and doubly released by pointing to a null pointer.

Since you pointed out that pointers are dangerous for memory leaks, I reviewed the entire code. There is a group of global variables called common in fotran, and I found that this group was being used as a pointer type array that can be reassigned. There are separate subroutines for allocation and deallocation. Also, an interface is inserted and used through a fi file in a subroutine that requires allocation and deallocation. if(associated (A)) is the method of deallocate(A). Initially, it seems that allocatable, target, and save were tried, but since there was no permission to allocate and deallocate, it seems that pointer was used. This module also appears to have used pointers to release and allocate memory. I would like to ask your opinion if you have any advice.

Thank you so much !!

Vivid

Vivid · ‎09-01-2023

module mod_tp_class
    use omp_lib

    type T_N_Class
        real*8   , allocatable      :: Q (:)
        integer*4, allocatable      :: W(:)
        integer*4                   :: nXYZ    =  (0)
    end type

    type T_M_Class
        type (T_N_Class)      :: V3
        real*8                :: Range(6)  = (/0.0d0,0.0d0,0.0d0,0.0d0,0.0d0,0.0d0/) ! (Xmin,Xmax,Ymin,Ymax,Zmin,Zmax)
        real*8                :: Volume    = (0.0d0)
    end type

    type T_H_Class
        type (T_M_Class), allocatable     :: Q(:) 
        type (T_M_Class), allocatable     :: W(:)
        integer*4					:: itest = (0)
        integer*4 , allocatable           :: OUT_test(:) 
        integer*4 , allocatable           :: IN_test(:) 
    end type

    type type_tp
        real(8), allocatable :: G3DB(:), G3DC(:), G3DD(:)
    end type type_tp

    type (T_H_Class) :: GA
    type (T_H_Class) :: GB

    type(type_tp) :: tpp
    !$omp threadprivate(tpp) !!!
    !$omp threadprivate(GA, GB)
end module mod_tp_class

module mod_tp
    use omp_lib
    use mod_tp_class
    contains

    SUBROUTINE IHN_PL()
      IMPLICIT REAL*8 (A-H, O-Z), INTEGER*4 (I-N)

        COMMON /PARALLEL_TEST/ n_thread, n_chomp
        !$omp threadprivate ( /PARALLEL_TEST/)

        if(n_chomp .EQ. 0) then
          n_chomp = 1
          GA.itest = GA.itest +n_thread
          GA.OUT_test = GA.OUT_test +n_thread
          GA.IN_test = GA.IN_test +n_thread
          write(1412,*) '230727 2-0', n_thread, GA.itest, GA.OUT_test(2),  GA.IN_test(2)

	  else

        end if

    END SUBROUTINE

end module

module Module_Mem
    interface MemFree
        module procedure MemFree_i4, MemFree_r8
    end interface

    contains

    subroutine MF_i4(i_old)
    implicit real*8 (a-h,o-z), integer*4 (i-n)
    integer*4, allocatable :: i_old(:)

    if(allocated(i_old) )deallocate(i_old)

    return
    end subroutine

    subroutine MF_r8(r_old)
    implicit real*8 (a-h,o-z), integer*4 (i-n)
    real*8, allocatable :: r_old(:)

    if(allocated(r_old) )deallocate(r_old)

    return
    end subroutine

end module

      SUBROUTINE MBC_CONTACT()
        use omp_lib
        use mod_tp
        use Module_Mem
        IMPLICIT REAL*8 (A-H, O-Z), INTEGER*4 (I-N)
!
        COMMON /PARALLEL_TEST/ n_thread, n_chomp
        !$omp threadprivate ( /PARALLEL_TEST/)
!
	  if(allocated(GA.OUT_test)) then
          deallocate(GA.OUT_test)
	  end if
	  allocate (GA.OUT_test(10))
	  GA.OUT_test = 100

	  call OMP_SET_NUM_THREADS (4)
	  !$omp parallel default(PRIVATE) copyin( GA)
	  n_chomp = 0
	  n_thread = omp_get_thread_num ()

	  if(allocated(GA.IN_test)) call MemFree(GA.IN_test) 
	  allocate (GA.IN_test(10))
	  GA.IN_test = 10
	  write(*,*) n_thread, 'in omp phrase'

	  !$omp do
	  DO K =1, 20
	    call IHN_PL()
	  END DO
	  !$omp end do
	  !!$omp critical (end_test)
	  if(allocated(GA.IN_test)) call MemFree(GA.IN_test) 
	  !!$omp end critical (end_test)
	  !$omp end parallel

	  if(allocated(GA.OUT_test)) then
          deallocate(GA.OUT_test)
	  end if


	END SUBROUTINE

All of the above code is in separate files or folders, and much of it has been compressed.

The result of the txt file is as follows.

start program
230727 2-0 0 0 100 10
230727 2-0 2 2 102 12
230727 2-0 1 1 103 11
230727 2-0 3 3 106 13
end program

write(1412,*) '230727 2-0', n_thread, GA.itest, GA.OUT_test(2), GA.IN_test(2)

In the case of GA.OUT_test, it was allocated before starting parallel. When allocating GA in the module statement, threadprivate was declared, and when the omp statement started, all threads were copied to the value of 100 through copyin.

Only in the case of GA.IN_test does it seem to be unaffected by inter-thread values. Based on these results, it seems that in the module used in the current project, threadprivate should be specified at the same time as the type GA declaration in module definition, allocate at the start of the omp statement, and load the value if necessary.

Please check if this is correct.

And when allocate and deallocate, please let me know if critical and barrier are needed as mentioned earlier.

jimdempseyatthecove · ‎09-01-2023

Provided that:

a) GA is attributed !$omp threadprivate

b) No other code+process thread has allocated it's GA%IN_test

c) The call is NOT made within a parallel region

(IOW all process's threads GA remain unallocated)

A1:

At line 6, before execution, all process's threads GA's are unallocated, after allocation, only the main thread's GA is allocated (the rest remain unallocated)

At line 10 (removing '!!!Q1'), assuming there were a line 9.5 with threadnum = omp_get_thread_num(), the thread with threadnum==0 would see GA allocated, and thus would deallocate it's GA. The remaining threads of the parallel region see their GA as unallocated.

Before I continue, you may have noticed that I am providing information in a peculiar way, or more information than appears necessary. Not doing so, may lead you to make presumptions that are not true.

OpenMP provides for nested parallel regions. It provides for you to specify if nested regions are to be enabled or not. When parallel regions enabled, omp_get_thread_num() does not return an application-wide, OpenMP thread number. Rather it returns the parallel region's execution context team member number.

IOW when a procedure like MPC_CONTACT is called from outside parallel region(s), following !$omp parallel, and assuming thread capacity limit will not not be exceeded, a team of 4 threads is assembled: three threads are created and attached to the main thread. These 4 threads for this parallel region are numbered 0:3 with the main (calling) thread having 0, and the remaining 1,2,3 in creation order. Upon exit of the parallel region, the three additional threads remain active.

Now then, assume the main thread enters a parallel region with 4 threads. As the main thread has already a parallel region before, it simply reuses the threads it had accumulated, and these threads are numbered as before.

Further, assume thread 2 of this parallel region calls a procedure like MPC_CONTACT, the thread that entered the parallel region will assemble (reuse or create) a team of threads, whose thread numbers will be 0:3, with 0 being the region's instantiating thread.

The thread private data are local to each of the process's allocated threads.

A2:

With Intel Fortran, you can use either % or ., however, % is conformant with the Fortran standard. However, consider that using . could potentially get you into trouble. Consider a user defined type ALPHA with a member variable OR that is a user defined type with a member variable BETA

ALPHA.OR.BETA

Ask yourself, is .OR. a logical operator a member variable selector?

Note, the language provides for user defined operators as well as type bound operators.

I recommend using the %, to save you or others some problems later.

A3:

If at all practical, convert the code to use allocatable. Assume you have correct code where the procedure allocates to a pointer, the code runs, and deallocates associated pointers upon return. After you pass the code onto another maintainer, they add exception code which includes a RETURN.

Jim Dempsey

Vivid · ‎09-04-2023

I truly appreciate your excellent explanations and knowledge. I learned a lot from this answer and it was a great help in proceeding with the project. I would appreciate it if you would allow me to mention this again as a way of organizing this knowledge before applying it.

Also, while organizing, I added a Q mark to any additional questions I had.

First, the directive to access module members must be changed from '.' to '%'. The '%' method is standard in the Fortran language and is good for preventing future problems.

++Q1.

Is it true that the problem here is that computers can cause various errors during sophisticated calculations? Even now, even without omp, there are often unresolved problems while debugging the project, so I add additional questions.

Second, all pointer arrays used for reallocation purposes in the omp statement must be changed to allocatable. This is because it may cause problems such as memory leaks.

Thirdly, what is allocated before parallel startup is not allocated for each thread. Therefore, in order to have individualized module memory between threads, define the module, declare it as threadprivate, and then include it in the corresponding subroutine.

Then, when the omp statement starts, 'GA.Q.V3.Q(10)' must be dynamically allocated so that each thread can be allocated and controlled individually. Also, make sure to release all allocated configurations before termination.

Fourth, it is a good idea to prevent memory conflicts from occurring during allocation and deallocation between threads by using critical when allocating and deallocating at the beginning and end of parallel as mentioned earlier. In the case of a barrier, it is recommended to declare it in the front line of the critical area for allocation and deallocation mentioned above.

++ Q2.

if(.not. allocated(array)) then

!$omp critical(array_critical)

if(.not. allocated(array)) allocate(array(NN))

!$omp end critical(array_critical)

endif

do

...

end do

I would like to know whether critical and barrier are essential.

This is because allocation and deallocation are not limited to the beginning and end of parallelism. When going through several subroutines within the omp statement, reallocation may occur. In this case, you may need to declare it locally as allocatable and copy the data, then free and allocate the existing array and copy it again.

++Q3.

A separate subroutine is used for memory reallocation. As mentioned in the previous question, the subroutine declares an allocatable type array A, stores the data in the existing array B, reallocates the size of B, deletes array A, and returns. In this case, you need to check whether there are any problems within the omp syntax.

++Q4.

Among the arrays used in omp do, common includes an array that has been dynamically reallocated as a pointer type. It was confirmed that reallocation is not performed within the omp statement. I would like to know if there is no problem under the condition of not reallocating. The common is used as a shared type and seems to be mainly used only for reading values. If a problem arises where the value must be private and the value must be written, can I take another method?

This is used by declaring an array to store common data in the subroutine where omp is started. The reason why it is difficult to apply it unconditionally is because there are a significant number of common statements and allocating all of them would likely require a lot of memory.

If there is anything that needs to be corrected in this content, I would appreciate your help.

Vivid

TobiasK · ‎09-04-2023

@Vivid if you say the code runs into problems even without OpenMP, I would first debug that part before adding more complexity.

While it might be in general a good idea to avoid pointers in Fortran, there are (corner) cases where you need them. Please do not blindly replace pointers with allocates, first understand, if there was a need for pointers.

As for the critical sections for allocates, @jimdempseyatthecove I hope I don't miss anything here, but I would strongly recommend to not use critical sections for shared allocatable arrays. Either single or even better the master construct. For private arrays critical is also not needed.

Vivid · ‎09-04-2023

Thank you for your thoughtful advice.

First of all, even without OpenMP, intermittent bugs are difficult to reproduce. It does not occur in the area we are currently discussing. However, occasionally, a problem that is difficult to analyze the cause of occurs, and it seems similar to the pointer problem I am currently experiencing, so I added this information. So I will try to check pointer arrays in the long term.

Also, as for the critical area, as you advised, it should not be divided into shared or private types, but I will carefully check whether it is really necessary before using it. Is your advice correct? The more serial areas there are in a parallel region, the more bottleneck problems may occur.

The reason I tried to use the critical area was to prevent the memory area between threads from being invaded during dynamic reallocation within the parallel section. This is because a large heap area is shared between threads, and it was thought that the area being used by threads 0 to 3 may be invaded during the allocation and reallocation process. Is it possible to share which memory address is being used between threads throughout the parallel section?

jimdempseyatthecove · ‎09-04-2023

@TobiasK

While the preferred way to perform the allocation of a shared array is to perform the allocation before you enter the parallel region. What I showed is a technique to (most) efficiently perform this allocation within the parallel region, as on occasion, you may need to do this (e.g. grow the previously allocated array). TobiasK's posts seem to indicate that he was allocating within the parallel region.

Note, !$omp single is not always usable, for example inside a loop within the parallel region.

Jim Dempsey

jimdempseyatthecove · ‎09-04-2023

@TobiasK

>>Is it true that the problem here is that computers can cause various errors during sophisticated calculations?

The vast majority of cases it is the programmer causing various errors, or errors in assumptions about the significance of a result.

Floating point expressions are performed using limited precision, while algebraic expressions are not performed. Back in my days in college, we didn't use calculators, instead we used slide rules with approximately 3 digits of precision. Back then we always knew the results of calculations carried a degree of error. Today, noob programmers mistakenly assume calculations are exact. Learn not to make this assumption.

>>all pointer arrays used for reallocation purposes in the omp statement must be changed to allocatable.

I did not say "all". Pointers are perfectly valid, as long as programmed correctly.

>>Thirdly,...

You made the statement that you wanted all variables within a module to be private to each thread. I provided you with two methods of doing this. The preferred method (IMHO) is to use a threadprivate UDT such that when reading the code, the reader clearly sees that the variable is thread private (a third way would be to omit the UDT, prefix each variable with "tp", and attribute all variables with threadprivate).

>>Forth, ... using critical...

It is best to perform the shared allocations outside of the parallel region. Only when the shared allocations must be performed within the parallel region, then you need to protect the allocation from being multiply allocated.

>>++ Q2

The code snip, without the barrier, makes a critical assumption about the implementation that the last item placed into the array descriptor is the only item used to test the "allocated" status. IOW the array descriptor is completely formed. This might be too much of an assumption. The code example is optimized to avoid the critical section if it is not needed.

>>When going through several subroutines within the omp statement, reallocation may occur. In this case, you may need to declare it locally as allocatable and copy the data, then free and allocate the existing array and copy it again.

Reallocation of what? (shared array or private array)

Reallocation performed where? (multiple subroutines called within the parallel region)

Your question isn't making sense.

I will address Q3,Q4 in a bit.

Jim Dempsey

TobiasK · ‎09-04-2023

@jimdempseyatthecove , I think you wanted to mention @Vivid not me:)

About critical, you are correct, I forgot about nested parallelism, in that case single and master does not work.

jimdempseyatthecove · ‎09-04-2023

You should not code parallel regions whereby threads allocate and/or reallocate shared resources. This can get you into trouble when these allocation/reallocations occur without coordination with all the participating threads of the region.

>>Q3

If B is shared, and it needs to be reallocated, then use something like:

if(size(B) < WhatYouNeed) then
  ! use a barrier to assure all threads see the need for expansion of B
  ! before you expand B
  !$omp barrier
  ! all threads assembled, only one will 
  !$omp single
  call YourReallocationOfB() ! the called routine does not require critical
  !$omp end single
  ! there is an implicit barrier at end of single
endif

Caution, the above code occurs in one place of the code (where all threads of the region pass through).

Should this not be the case, then you potentially will run into a deadlock where (at least) two of those sections are waiting at their barrier for the other threads to meet at the barrier (they cannot as they are waiting at the other barrier(s)).

>> Q4

This is difficult to answer because of lack of situational awareness on my part, and perhaps on your part as well. You mentioned earlier, that you are using a 3rd party DLL. But you did not mention as to if this 3rd party DLL is using a named common block, iow a shared common block, to be used as an interface between the caller (your app) and the DLL. As opposed to passing the context via arguments on call. In this case, you have no means of control within the DLL as to if its use of the named common block is threadprivate or not.

This makes parallizing the usage more difficult, and in which case, it may be advisable to switch over to using MPI (or critical section with copying of thread-local copy of the DLL's API common block).

Jim Dempsey