- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an issue to have the code running using more than 2 cores when having a top level 2 parallel sections and a next level 2 parallel sections. The suedo code is provided below. I am expecting to see that the code should be finished 4 times faster on a 8 core(2 quad-core) machine. However, I got twice the speed comparing to no parallelization. There were no difference if I set the OMP_NUM_THREADS to 2 or 4 or 8. However, the number of threads showing up in the task manager did have a difference. I am using intel C++ compiler 11.0 (and 9.1 too). Could anybody suggest what I am missing?
Thanks.
void compute()
{
// heavy computations
}
void func()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
compute();
#pragma omp section
compute();
}
}
}
int main()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
func();
#pragma omp section
func();
}
}
{
#pragma omp sections
{
#pragma omp section
func();
#pragma omp section
func();
}
}
}
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]void compute()
{
// heavy computations
}
void func()
{
// in order for this to initiate
// a parallel region nested must be enabled
// SET OMP_NESTED=TRUE
// .or.
// omp_set_nested(1);
//
// when nested disabled, this runs serial
// for each of the 2 threads calling func
// 4 x compute() using 2 threads.
// when nested enabled (read comments inside block)
//
#pragma omp parallel
{
// each of 2 threads calling func() picks up
// additional threads here
// 7 or less depending on options
#pragma omp sections
{
// however you only have 2 sections
#pragma omp section
compute(); // 1 thread of each team
#pragma omp section
compute(); // different 1 thread of each team
}
}
}
// end result 4 x compute() using 4 threads (when nested enabled)
//
int main()
{
#pragma omp parallel
{
// 8 threads falling through this comment
// (assuming OMP_NUM_THREADS 8)
#pragma omp sections
{
#pragma omp section
func(); // one of the 8 threads makes this call
#pragma omp section
func(); // different one the 8 threads makes this call
}
// the other 6 threads branch around to here
}
}
[/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot! This solves the problem.
Quoting - jimdempseyatthecove
[cpp]void compute()
{
// heavy computations
}
void func()
{
// in order for this to initiate
// a parallel region nested must be enabled
// SET OMP_NESTED=TRUE
// .or.
// omp_set_nested(1);
//
// when nested disabled, this runs serial
// for each of the 2 threads calling func
// 4 x compute() using 2 threads.
// when nested enabled (read comments inside block)
//
#pragma omp parallel
{
// each of 2 threads calling func() picks up
// additional threads here
// 7 or less depending on options
#pragma omp sections
{
// however you only have 2 sections
#pragma omp section
compute(); // 1 thread of each team
#pragma omp section
compute(); // different 1 thread of each team
}
}
}
// end result 4 x compute() using 4 threads (when nested enabled)
//
int main()
{
#pragma omp parallel
{
// 8 threads falling through this comment
// (assuming OMP_NUM_THREADS 8)
#pragma omp sections
{
#pragma omp section
func(); // one of the 8 threads makes this call
#pragma omp section
func(); // different one the 8 threads makes this call
}
// the other 6 threads branch around to here
}
}
[/cpp]
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page