- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John Campbell wrote:In Quote #11, you could try the following changes as localCount is shared. ( or use !$OMP atomic ?)
!$OMP PARALLEL DO default (shared) private (i, id, percentToDo) reduction (+ : localCount) do i = 1, 10 id = OMP_GET_THREAD_NUM() localCount = localCount + 1 percentToDo = (sin(req%Aii(1)) + cos(req%Aii(2))) + id print *, i, percentToDo, id, OMP_GET_MAX_THREADS(), OMP_IN_PARALLEL() end do !$OMP END PARALLEL DO print *, localCount print *, i, percentToDo, id, OMP_GET_MAX_THREADS(), OMP_IN_PARALLEL(),' is i=11 or 0 ?'
Thanks John, but the value of that LocalCount variable is not really material. I just added it in because I wanted to see how many times it was executing the loop. It acts the same with or without that variable. The problem I am having is that it seems to be just ignoring the OMP DO altogether. If I create a parallel region with OMP PARALLEL followed by OMP DO, each thread just executes the whole loop meaning 8 * 10 executions and if I use PARALLEL DO, it just executes the entire loop on 1 thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Philliard,
You indicated in post #11 that the !$OMP region was remaining as single thread. I have seen no indication why this happened. If you print out the value of i after the DO, it will be 11 for a non-OMP or undefined if OMP was active. There can be poor warnings if !$OMP directives are ignored. I prefer to have explicit private and shared, to at least review how arrays are being used and replicated.
You say that OMP is not working, by comparing 2017 to 2019 performance, but the most significant change is for the non-omp (5.5 sec to 59 sec). What is the reason for this ? You need an explanation. (different test, bigger problem, reduced cacheing ?) It is a big change.
The other issue is OMP efficiency (8 threads:32 sec vs 1 thread:59 sec). There is not good efficiency there. Make sure you are comparing the OMP region performance. Then there are the inefficiency possibilities. Count the number of OMP region entries ( @ 2.e-5 sec per entry ), memory clash for updating shared arrays, have memory transfer demands increased.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John Campbell wrote:Philliard,
You indicated in post #11 that the !$OMP region was remaining as single thread. I have seen no indication why this happened. If you print out the value of i after the DO, it will be 11 for a non-OMP or undefined if OMP was active. There can be poor warnings if !$OMP directives are ignored. I prefer to have explicit private and shared, to at least review how arrays are being used and replicated.
You say that OMP is not working, by comparing 2017 to 2019 performance, but the most significant change is for the non-omp (5.5 sec to 59 sec). What is the reason for this ? You need an explanation. (different test, bigger problem, reduced cacheing ?) It is a big change.
The other issue is OMP efficiency (8 threads:32 sec vs 1 thread:59 sec). There is not good efficiency there. Make sure you are comparing the OMP region performance. Then there are the inefficiency possibilities. Count the number of OMP region entries ( @ 2.e-5 sec per entry ), memory clash for updating shared arrays, have memory transfer demands increased.
Sorry, I mis-labeled the test cases - the compiler was 2020 (update 1), not 2019.
The code and the test problem I am running are identical between the 2017 and 2019 versions. The only difference in the test cases are the code being compiled with the different compiler. With the 2017 compiled version, no matter how many cores I request, I get the same execution time and I can tell by watching task manager it is only using one thread.
I know that I am looking at timing for the whole code and not just the parallel region and so I know that there are a lot of inefficiencies in the multi-threading result. However, my biggest concern for right now is why is it so much slower with the 2019 compiled version. My code is the same and the test case is the same so something has changed in the compiler and apparently something in my code combined with changes in the compiler has caused the code to slow way down.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Due to the timing variations between versions being approximately 10x, this is indicative of potentially several things:
1) One compilation using array bounds checking and the other not,
2) You have sensitive verses insensitive convergence code where one version converges in much fewer iterations than the other
3) Your 8 threads are running on one hardware thread on one verson, and on all hardware threads on a different version
Note for 3). You state that your code is a DLL that is using OpenMP. Note that if the executable that calls your DLL sets the process affinity to use 1 logical processor (1 hardware thread), and you DLL code instructs OpenMP to use 8 threads, then those 8 threads will run on that single logical processor.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assure you have use omp_lib as well
program YourProgram
use omp_lib
....
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »