- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have following code:
float arr[1000]; double pre = omp_get_wtime(); for(int j=0; j<1000; ++j) { #pragma omp parallel num_threads(t1) { #pragma omp for for(int i=0; i<1000; ++i) arr = std::pow(i,2); } #pragma omp parallel num_threads(t2) { #pragma omp for for(int i=0; i<1000; ++i) arr = std::pow(i,2); } } double post = omp_get_wtime(); double diff = post - pre;
I get strange times for t1 and t2:
- for t1=1, t2=36 diff is 0.070
- for t1=2, t2=36 diff is 1.307
- for t1=8, t2=36 diff is 1.023
- for t1=18, t2=36 diff is 0.690
- for t1=24, t2=36 diff is 0.427
- for t1=36, t2=36 diff is 0.076
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, cores per socket: 18, virtualization: VT-x, sockets: 2, L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 46080K, CentOS 7
Is there any problem with thread pooling between sections (teams) in OpenMP ?
Thanks in advance.
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At the start of your program add
int nThreads;
#pragma omp parallel
nThreads = omp_get_num_threads();
The intention is to enter a first parallel region, that is outside of your timed loop, and thus pre-creating the OpenMP thread pool with a full complement of threads. (You will have to expand on this if you use nested parallel regions). The way you structured your program, each increase in thread count caused unnecessary overhead.
Please do this and report back your findings.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, it doesn't work. Our team still try to find a solution.
Thank you for your answer.
Krzysztof Binias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With the worst case adding more than one second to the best case leads me to suspect this is a program initialization issue. Such discrepancy can occur if your system is heavily loaded. Can you upload your entire test program that exhibits this problem. One such example is if you specify an (obscenely) large stack size, your program is doing something to "first touch" this stack, and as a consequence each new thread instantiation causes an excessive amount of page faults (to allocate from page file, map to VM, possibly wipe), all in competition with other demands on your storage system. This would occur as a once only symptom. Once OpenMP creates the thread (adds to a given thread pool), the threads remain available for first and subsequent use. Thereafter any new "first touch" of your VM would undergo page fault hoop jump.
Also, as an experimental probe, as well as insight purposed, what happens when you swap t1 and t2 in your num_threads clauses?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page