- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an issue to have the code running using more than 2 cores when having a top level 2 parallel sections and a next level 2 parallel sections. The suedo code is provided below. I am expecting to see that the code should be finished 4 times faster on a 8 core(2 quad-core) machine. However, I got twice the speed comparing to no parallelization. There were no difference if I set the OMP_NUM_THREADS to 2 or 4 or 8. However, the number of threads showing up in the task manager did have a difference. I am using intel C++ compiler 11.0 (and 9.1 too). Could anybody suggest what I am missing?
Thanks.
void compute()
{
// heavy computations
}
void func()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
compute();
#pragma omp section
compute();
}
}
}
int main()
{
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
func();
#pragma omp section
func();
}
}
{
#pragma omp sections
{
#pragma omp section
func();
#pragma omp section
func();
}
}
}
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]void compute() { // heavy computations } void func() { // in order for this to initiate // a parallel region nested must be enabled // SET OMP_NESTED=TRUE // .or. // omp_set_nested(1); // // when nested disabled, this runs serial // for each of the 2 threads calling func // 4 x compute() using 2 threads. // when nested enabled (read comments inside block) // #pragma omp parallel { // each of 2 threads calling func() picks up // additional threads here // 7 or less depending on options #pragma omp sections { // however you only have 2 sections #pragma omp section compute(); // 1 thread of each team #pragma omp section compute(); // different 1 thread of each team } } } // end result 4 x compute() using 4 threads (when nested enabled) // int main() { #pragma omp parallel { // 8 threads falling through this comment // (assuming OMP_NUM_THREADS 8) #pragma omp sections { #pragma omp section func(); // one of the 8 threads makes this call #pragma omp section func(); // different one the 8 threads makes this call } // the other 6 threads branch around to here } } [/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot! This solves the problem.
Quoting - jimdempseyatthecove
[cpp]void compute() { // heavy computations } void func() { // in order for this to initiate // a parallel region nested must be enabled // SET OMP_NESTED=TRUE // .or. // omp_set_nested(1); // // when nested disabled, this runs serial // for each of the 2 threads calling func // 4 x compute() using 2 threads. // when nested enabled (read comments inside block) // #pragma omp parallel { // each of 2 threads calling func() picks up // additional threads here // 7 or less depending on options #pragma omp sections { // however you only have 2 sections #pragma omp section compute(); // 1 thread of each team #pragma omp section compute(); // different 1 thread of each team } } } // end result 4 x compute() using 4 threads (when nested enabled) // int main() { #pragma omp parallel { // 8 threads falling through this comment // (assuming OMP_NUM_THREADS 8) #pragma omp sections { #pragma omp section func(); // one of the 8 threads makes this call #pragma omp section func(); // different one the 8 threads makes this call } // the other 6 threads branch around to here } } [/cpp]
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page