- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to collect guidance on doing parallel calculation /open mp with date from a module.
1.Within a module function/subroutine is it safe to do some parallel calculation using and updating the module variable
2. Within a module function/subroutine is it safe to do call a module function/subroutine which themselves will be updating/using the module variable
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you intend for the module variable, which is shared amongst threads (when not attributed as threadprivate), and you wish for multiple threads to update said variable, then you must use a serializing construct such as
!$OMP CRITICAL
modVar = modVar + delta
!$OMP end CRITICAL
!$OMP CRITICAL(modVar_critical)
modVar = modVar + delta
!$OMP end CRITICAL(modVar_critical)
!$OMP ATOMIC
modVar = modVar + delta
!$OMP END ATOMIC
Note, there are variations on these directives. Consult the documentation.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok thanks, assuming each thread will be updating a different slice of a module array (call it u(:) ) with data from a module shared array (which will be read only call it (read_array(:)) cant I use !OMP Parallel do like I would for normal array declared locally where I both u and read_array are public
!$omp parallel do default(shared) private(i)
do I = 1, counter
u(i) = u(i) + read_array(i)
enddo
Is something special about the array being shared across the module ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have an EXE that uses routines from a user-built DLL, and there are routines in both that USE data in a module, sharing the data between the EXE and DLL, not to mention threads in the EXE and threads in the DLL, requires careful attention to detail. Be aware that the EXE and DLL may end up with distinct and incoherent copies of the module variables otherwise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks all,
so basically i should stick to locally defined variables with my procedure to be on the safe side ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>so basically i should stick to locally defined variables with my procedure to be on the safe side ?
mecej4 did not say nor indicate that this is what you should do. Rather, the advice is to learn the nuances of shared data, scope of data, and thread pools.
What you show in #4 is correct...
... provided that array u is not also concurrently being processed by a separate parallel do or other threads than those specified in the #4 code snip.
Note, while the code above does not show this concurrency, the accompanying statement does not state otherwise.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Le Callet, Morgan M wrote:
!$omp parallel do default(shared) private(i)
do I = 1, counter
u(i) = u(i) + read_array(i)
enddo
This example is a special case. Array U can be shared, as it is addressed by the OMP "i" index. In this case, the same value of U would not be overwritten by different threads, so in this respect there is no problem. CRITICAL would not be required
There are problems of practicality, as the array U is being updated by different threads and so the same page of memory is being updated by all threads (cores) which can lead to cacheing inefficiency.
If read_array(I) is an array, rather than a function, there is not much work being done by each thread iteration and so you may wish to review the SCHEDULE option. SCHEDULE (STATIC) could be best. Also, the use of CRITICAL has an overhead, which may swamp any performance gain you are hoping to achieve. PARALLEL DO, CRITICAL, ATOMIC etc all have different (significant) overheads, when measured in processor cycles, and so the computational effort for each iteration of the omp loop must be sufficient to overcome this overhead.
You could try something like the following and see if this simple example provides any performance improvement. If n is too small, the OMP overhead will be excessive, while if n is too large, the memory addressing overhead will be limiting. A best case scenario is you need to give each thread something significant to do.
t1 = get_processor_ticks () sum = 0 !$OMP PARALLEL DO PRIVATE(i) SHARED (A,n) SCHEDULE(STATIC) REDUCTION(+:sum) do i = 1,n sum = sum + A(i) end do !$OMP END PARALLEL DO to = get_processor_ticks () - t1 t1 = get_processor_ticks () sum = 0 do i = 1,n sum = sum + A(i) end do ts = get_processor_ticks () - t1
Too often, !$OMP demonstration examples use very simple DO loops to demonstrate the functionality of using multiple threads, but the practicality of using multi-threading is not achieved, as the computational saving { run time * (num_threads-1) / num_threads } is lost in the !$OMP initiation overheads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
Many tanks for the feedback and i appologise if i sounded off.
I will go and study the different nuances as suggested. This really struck a cord however:
"Too often, !$OMP demonstration examples use very simple DO loops to demonstrate the functionality of using multiple threads, but the practicality of using multi-threading is not achieved, as the computational saving { run time * (num_threads-1) / num_threads } is lost in the !$OMP initiation overheads."
I wonder if i should let the automatic parallelisation do the job ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
The case I was presenting was not for the simple single parallel region producing a reduction-able sum. Thanks for pointing out the REDUCTION clause to Morgan, I was rather referring to having multiple concurrent parallel regions that could occur via nested parallelism or via separate tasks, each with a parallel region manipulating the same module/common variable. These situations require a bit more care to do efficiently.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page