Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Help for designing OpenMP thread model.

Lewis__Rubin
Novice
341 Views

Hi. I have some codes using OpenMP. The snippet is like this:
 

!$OMP parallel do
do ii = 1, 10
    call sub1(ii, ...)
    do jj = 1, 2
        call sub2(jj, ii, ...)  ! Take long time.
    end do
    call sub3(ii, ...)
end do
!$OMP end parallel do

The question  is How to change my codes If I have available CPUs>10 (eg. 16 cores).

0 Kudos
1 Solution
Li_L_
New Contributor I
341 Views

Lewis, Rubin wrote:

Li,

Thanks for the reply.
I am not sure about your resolution. As far as I know,  calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.

Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).

Best wishes,
Rubin
 

actually i have few experience with omp, and just wrote some for my fluid code.

the below is my solution in my understanding

if i were you, i would create 16 threads, and start <!$omp parallel do> in the outer loop, which means 8 of 16 get a task respectively

then use the <!$omp task> in the inner loop, and 8 of 16 create the tasks

the heavy sub2 can run in a more dynamic mode among the 16 threads

in this process, we really have 8 primary threads to deal with sub1&sub3 but we don't know the labels of them. Meanwhile the sub2 can be executed in an efficient way among the 16 cores

View solution in original post

0 Kudos
3 Replies
Li_L_
New Contributor I
341 Views

use omp_lib

call omp_set_num_threads(16)

 

you can visit the omp homepage to download an user manual

intel compiler 17 adopted the standard of the openMP 4.5

0 Kudos
Lewis__Rubin
Novice
341 Views

Li,

Thanks for the reply.
I am not sure about your resolution. As far as I know,  calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.

Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).

Best wishes,
Rubin
 

0 Kudos
Li_L_
New Contributor I
342 Views

Lewis, Rubin wrote:

Li,

Thanks for the reply.
I am not sure about your resolution. As far as I know,  calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.

Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).

Best wishes,
Rubin
 

actually i have few experience with omp, and just wrote some for my fluid code.

the below is my solution in my understanding

if i were you, i would create 16 threads, and start <!$omp parallel do> in the outer loop, which means 8 of 16 get a task respectively

then use the <!$omp task> in the inner loop, and 8 of 16 create the tasks

the heavy sub2 can run in a more dynamic mode among the 16 threads

in this process, we really have 8 primary threads to deal with sub1&sub3 but we don't know the labels of them. Meanwhile the sub2 can be executed in an efficient way among the 16 cores

0 Kudos
Reply