- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I have following code:
float arr[1000];
double pre = omp_get_wtime();
for(int j=0; j<1000; ++j)
{
#pragma omp parallel num_threads(t1)
{
#pragma omp for
for(int i=0; i<1000; ++i) arr = std::pow(i,2);
}
#pragma omp parallel num_threads(t2)
{
#pragma omp for
for(int i=0; i<1000; ++i) arr = std::pow(i,2);
}
}
double post = omp_get_wtime();
double diff = post - pre;
I get strange times for t1 and t2:
- for t1=1, t2=36 diff is 0.070
- for t1=2, t2=36 diff is 1.307
- for t1=8, t2=36 diff is 1.023
- for t1=18, t2=36 diff is 0.690
- for t1=24, t2=36 diff is 0.427
- for t1=36, t2=36 diff is 0.076
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, cores per socket: 18, virtualization: VT-x, sockets: 2, L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 46080K, CentOS 7
Is there any problem with thread pooling between sections (teams) in OpenMP ?
Thanks in advance.
- Marcas:
- Parallel Computing
Link copiado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
At the start of your program add
int nThreads;
#pragma omp parallel
nThreads = omp_get_num_threads();
The intention is to enter a first parallel region, that is outside of your timed loop, and thus pre-creating the OpenMP thread pool with a full complement of threads. (You will have to expand on this if you use nested parallel regions). The way you structured your program, each increase in thread count caused unnecessary overhead.
Please do this and report back your findings.
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Unfortunately, it doesn't work. Our team still try to find a solution.
Thank you for your answer.
Krzysztof Binias
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
With the worst case adding more than one second to the best case leads me to suspect this is a program initialization issue. Such discrepancy can occur if your system is heavily loaded. Can you upload your entire test program that exhibits this problem. One such example is if you specify an (obscenely) large stack size, your program is doing something to "first touch" this stack, and as a consequence each new thread instantiation causes an excessive amount of page faults (to allocate from page file, map to VM, possibly wipe), all in competition with other demands on your storage system. This would occur as a once only symptom. Once OpenMP creates the thread (adds to a given thread pool), the threads remain available for first and subsequent use. Thereafter any new "first touch" of your VM would undergo page fault hoop jump.
Also, as an experimental probe, as well as insight purposed, what happens when you swap t1 and t2 in your num_threads clauses?
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora