- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi. I have some codes using OpenMP. The snippet is like this:
!$OMP parallel do do ii = 1, 10 call sub1(ii, ...) do jj = 1, 2 call sub2(jj, ii, ...) ! Take long time. end do call sub3(ii, ...) end do !$OMP end parallel do
The question is How to change my codes If I have available CPUs>10 (eg. 16 cores).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lewis, Rubin wrote:
Li,
Thanks for the reply.
I am not sure about your resolution. As far as I know, calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).
Best wishes,
Rubin
actually i have few experience with omp, and just wrote some for my fluid code.
the below is my solution in my understanding
if i were you, i would create 16 threads, and start <!$omp parallel do> in the outer loop, which means 8 of 16 get a task respectively
then use the <!$omp task> in the inner loop, and 8 of 16 create the tasks
the heavy sub2 can run in a more dynamic mode among the 16 threads
in this process, we really have 8 primary threads to deal with sub1&sub3 but we don't know the labels of them. Meanwhile the sub2 can be executed in an efficient way among the 16 cores
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
use omp_lib
call omp_set_num_threads(16)
you can visit the omp homepage to download an user manual
intel compiler 17 adopted the standard of the openMP 4.5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Li,
Thanks for the reply.
I am not sure about your resolution. As far as I know, calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.
Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).
Best wishes,
Rubin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lewis, Rubin wrote:
Li,
Thanks for the reply.
I am not sure about your resolution. As far as I know, calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).
Best wishes,
Rubin
actually i have few experience with omp, and just wrote some for my fluid code.
the below is my solution in my understanding
if i were you, i would create 16 threads, and start <!$omp parallel do> in the outer loop, which means 8 of 16 get a task respectively
then use the <!$omp task> in the inner loop, and 8 of 16 create the tasks
the heavy sub2 can run in a more dynamic mode among the 16 threads
in this process, we really have 8 primary threads to deal with sub1&sub3 but we don't know the labels of them. Meanwhile the sub2 can be executed in an efficient way among the 16 cores
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page