Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

problem with nested openmp parallel

I have an issue to have the code running using more than 2 cores when having a top level 2 parallel sections and a next level 2 parallel sections. The suedo code is provided below. I am expecting to see that the code should be finished 4 times faster on a 8 core(2 quad-core) machine. However, I got twice the speed comparing to no parallelization. There were no difference if I set the OMP_NUM_THREADS to 2 or 4 or 8. However, the number of threads showing up in the task manager did have a difference. I am using intel C++ compiler 11.0 (and 9.1 too). Could anybody suggest what I am missing?


void compute()
// heavy computations
void func()
#pragma omp parallel
#pragma omp sections
#pragma omp section
#pragma omp section

int main()
#pragma omp parallel
#pragma omp sections
#pragma omp section
#pragma omp section
0 Kudos
2 Replies
Honored Contributor III

[cpp]void compute()
    // heavy computations
void func()
   // in order for this to initiate
   // a parallel region nested must be enabled
   // .or.
   // omp_set_nested(1);
   // when nested disabled, this runs serial
   // for each of the 2 threads calling func
   //  4 x compute() using 2 threads.
   // when nested enabled (read comments inside block)
   #pragma omp parallel 
      // each of 2 threads calling func() picks up
      // additional threads here
      // 7 or less depending on options 
      #pragma omp sections 
          // however you only have 2 sections
          #pragma omp section
               compute();	// 1 thread of each team
          #pragma omp section
               compute();       // different 1 thread of each team

// end result 4 x compute() using 4 threads (when nested enabled)

int main()
  #pragma omp parallel 
    // 8 threads falling through this comment
    // (assuming OMP_NUM_THREADS 8)

    #pragma omp sections 
     #pragma omp section
        func();	// one of the 8 threads makes this call
     #pragma omp section
        func(); // different one the 8 threads makes this call
     // the other 6 threads branch around to here


Jim Dempsey

0 Kudos
Thanks a lot! This solves the problem.

[cpp]void compute()
    // heavy computations
void func()
   // in order for this to initiate
   // a parallel region nested must be enabled
   // .or.
   // omp_set_nested(1);
   // when nested disabled, this runs serial
   // for each of the 2 threads calling func
   //  4 x compute() using 2 threads.
   // when nested enabled (read comments inside block)
   #pragma omp parallel 
      // each of 2 threads calling func() picks up
      // additional threads here
      // 7 or less depending on options 
      #pragma omp sections 
          // however you only have 2 sections
          #pragma omp section
               compute();	// 1 thread of each team
          #pragma omp section
               compute();       // different 1 thread of each team

// end result 4 x compute() using 4 threads (when nested enabled)

int main()
  #pragma omp parallel 
    // 8 threads falling through this comment
    // (assuming OMP_NUM_THREADS 8)

    #pragma omp sections 
     #pragma omp section
        func();	// one of the 8 threads makes this call
     #pragma omp section
        func(); // different one the 8 threads makes this call
     // the other 6 threads branch around to here


Jim Dempsey

0 Kudos