Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26745 Discussions

Help for designing OpenMP thread model.

Lewis__Rubin
Novice
126 Views

Hi. I have some codes using OpenMP. The snippet is like this:
 

!$OMP parallel do
do ii = 1, 10
    call sub1(ii, ...)
    do jj = 1, 2
        call sub2(jj, ii, ...)  ! Take long time.
    end do
    call sub3(ii, ...)
end do
!$OMP end parallel do

The question  is How to change my codes If I have available CPUs>10 (eg. 16 cores).

0 Kudos
1 Solution
Li_L_
New Contributor I
126 Views

Lewis, Rubin wrote:

Li,

Thanks for the reply.
I am not sure about your resolution. As far as I know,  calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.

Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).

Best wishes,
Rubin
 

actually i have few experience with omp, and just wrote some for my fluid code.

the below is my solution in my understanding

if i were you, i would create 16 threads, and start <!$omp parallel do> in the outer loop, which means 8 of 16 get a task respectively

then use the <!$omp task> in the inner loop, and 8 of 16 create the tasks

the heavy sub2 can run in a more dynamic mode among the 16 threads

in this process, we really have 8 primary threads to deal with sub1&sub3 but we don't know the labels of them. Meanwhile the sub2 can be executed in an efficient way among the 16 cores

View solution in original post

3 Replies
Li_L_
New Contributor I
126 Views

use omp_lib

call omp_set_num_threads(16)

 

you can visit the omp homepage to download an user manual

intel compiler 17 adopted the standard of the openMP 4.5

Lewis__Rubin
Novice
126 Views

Li,

Thanks for the reply.
I am not sure about your resolution. As far as I know,  calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.

Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).

Best wishes,
Rubin
 

Li_L_
New Contributor I
127 Views

Lewis, Rubin wrote:

Li,

Thanks for the reply.
I am not sure about your resolution. As far as I know,  calling the subroutine <omp_set_num_threads> might only affect the outer loop. In my situation, the max outer loop iteration is 10. If I used the subroutine and created a group of 16 threads, that might be reduntant for the outer loop. So i try to parallel the inner loop as well.

Sorry for detailed problem. In my example, I expect that my program will run as the following process. supposing i have 16 available cores, the outer and inner loop are as the same as my post. I want OpenMP create 8 thread blocks (hopes the term is clear) for the outer loop, and each thread block consist of 2 general threads for the inner loop. And each thread block could have a primary thread running the code which is not in the inner loop (i.e. sub1 and sub3 in my example).

Best wishes,
Rubin
 

actually i have few experience with omp, and just wrote some for my fluid code.

the below is my solution in my understanding

if i were you, i would create 16 threads, and start <!$omp parallel do> in the outer loop, which means 8 of 16 get a task respectively

then use the <!$omp task> in the inner loop, and 8 of 16 create the tasks

the heavy sub2 can run in a more dynamic mode among the 16 threads

in this process, we really have 8 primary threads to deal with sub1&sub3 but we don't know the labels of them. Meanwhile the sub2 can be executed in an efficient way among the 16 cores

View solution in original post

Reply