Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
18 Views

Executing two calls to a LAPACK routine in parallel

Hello,
I'd like to execute two calls to a LAPACK routine (for example SVD) in parallel using openMP directives. I'd like both those calls to be threaded, i.e. if in total I have 16 cores, both calls should use 8 cores apiece. Can anyone suggest a way of doing that? I tried three approaches:

1) Nested pragmas:
omp_set_nested(1);
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num() == 0){
#pragma omp parallel num_threads(8)
{
//SVD of matrix 1
}
}else if(omp_get_thread_num() == 1){
#pragma omp parallel num_threads(8)
{
//SVD of matrix 2
}
}
}
This starts 8 separate single-threaded SVD computations concurrently, on each matrix.

2) Using mkl_set_num_threads:
mkl_set_num_threads(2);
if (omp_get_thread_num() == 0){
mkl_set_num_threads(8);
//SVD of matrix 1
}else if(omp_get_thread_num() == 1){
mkl_set_num_threads(8);
//SVD of matrix 2
}
This computes the two SVDs serially.

3) Using a pragma for the two threads that start the SVDs, and then using mkl_set_num_threads:
#pragma omp parallel num_threads(2)
{
if (omp_get_thread_num() == 0){
mkl_set_num_threads(8);
//SVD of matrix 1
}else if(omp_get_thread_num() == 1){
mkl_set_num_threads(8);
//SVD of matrix 2
}
}
The mkl_set_num_threads() call is ignored and both SVDs are single-threaded.
0 Kudos
1 Reply
Highlighted
18 Views

Hello,
You also need to add mkl_set_dynamic(false); before the OpenMP region.
By default the value is true and means that MKL dynamically allowed to change number of threads set by mkl_set_num_threads() if it seems reasonable. In the example MKL detects that higher level threading is in use. MKL couldn't detect all the details about the higher threading, thus just relies on it and runs in sequential mode assuming that the higher threading could refine the behavior with help of mkl_set_dynamic() and mkl_set_num_threads().
More details could be found in Intel MKL User's Guidesat Managing Performance and Memory -> Using Parallelism of the Intel Math Kernel Library -> Using Additional Threading Control.
With best regards,
Alexander
0 Kudos