Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

problem with nested openmp parallel

swi
Beginner
410 Views
I have an issue to have the code running using more than 2 cores when having a top level 2 parallel sections and a next level 2 parallel sections. The suedo code is provided below. I am expecting to see that the code should be finished 4 times faster on a 8 core(2 quad-core) machine. However, I got twice the speed comparing to no parallelization. There were no difference if I set the OMP_NUM_THREADS to 2 or 4 or 8. However, the number of threads showing up in the task manager did have a difference. I am using intel C++ compiler 11.0 (and 9.1 too). Could anybody suggest what I am missing?

Thanks.

void compute()
{
// heavy computations
}
void func()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
compute();
#pragma omp section
compute();
}
}
}

int main()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
func();
#pragma omp section
func();
}
}
}
0 Kudos
2 Replies
jimdempseyatthecove
Honored Contributor III
410 Views

[cpp]void compute()
{
    // heavy computations
}
void func()
{
   // in order for this to initiate
   // a parallel region nested must be enabled
   // SET OMP_NESTED=TRUE
   // .or.
   // omp_set_nested(1);
   //
   // when nested disabled, this runs serial
   // for each of the 2 threads calling func
   //  4 x compute() using 2 threads.
   // when nested enabled (read comments inside block)
   // 
   #pragma omp parallel 
   {
      // each of 2 threads calling func() picks up
      // additional threads here
      // 7 or less depending on options 
      #pragma omp sections 
      {
          // however you only have 2 sections
          #pragma omp section
               compute();	// 1 thread of each team
          #pragma omp section
               compute();       // different 1 thread of each team
       }
   }
}

// end result 4 x compute() using 4 threads (when nested enabled)
// 

int main()
{
  #pragma omp parallel 
  {
    // 8 threads falling through this comment
    // (assuming OMP_NUM_THREADS 8)

    #pragma omp sections 
    {
     #pragma omp section
        func();	// one of the 8 threads makes this call
     #pragma omp section
        func(); // different one the 8 threads makes this call
     }
     // the other 6 threads branch around to here
  }

}
[/cpp]

Jim Dempsey

0 Kudos
swi
Beginner
410 Views
Thanks a lot! This solves the problem.


[cpp]void compute()
{
    // heavy computations
}
void func()
{
   // in order for this to initiate
   // a parallel region nested must be enabled
   // SET OMP_NESTED=TRUE
   // .or.
   // omp_set_nested(1);
   //
   // when nested disabled, this runs serial
   // for each of the 2 threads calling func
   //  4 x compute() using 2 threads.
   // when nested enabled (read comments inside block)
   // 
   #pragma omp parallel 
   {
      // each of 2 threads calling func() picks up
      // additional threads here
      // 7 or less depending on options 
      #pragma omp sections 
      {
          // however you only have 2 sections
          #pragma omp section
               compute();	// 1 thread of each team
          #pragma omp section
               compute();       // different 1 thread of each team
       }
   }
}

// end result 4 x compute() using 4 threads (when nested enabled)
// 

int main()
{
  #pragma omp parallel 
  {
    // 8 threads falling through this comment
    // (assuming OMP_NUM_THREADS 8)

    #pragma omp sections 
    {
     #pragma omp section
        func();	// one of the 8 threads makes this call
     #pragma omp section
        func(); // different one the 8 threads makes this call
     }
     // the other 6 threads branch around to here
  }

}
[/cpp]

Jim Dempsey


0 Kudos
Reply