Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Gheibi__Sanaz
Beginner
123 Views

getting MKL thread IDs

Hi, 

We have a problem regarding mkl threads and we really appreciate your valuable help.  we are using mkl function calls in the nested parallel region below:

        omp_set_num_threads( NUM_OF_THREADS );
        omp_set_nested(1);
        omp_set_max_active_levels(2);


	#pragma omp parallel num_threads(2)
        {
                if (omp_get_thread_num() == 0){

                        mkl_set_num_threads_local(16);

                        printf("My ID is %d\n", omp_get_thread_num());
                       	cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                        m, n, p, 1, pA, p, pB, n, 0, pC1, n);
                }else{
                        mkl_set_num_threads_local(16);

                        printf("My ID is %d\n", omp_get_thread_num());
                       	cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
                        m, n, p, 1, pD, p, pE, n, 0, pC2, n);

                }
        }

Using VTune Amplifier, we can verify that the correct number of 32 threads are produced. However, the output of the print statements is as follows: 

My ID is 0
My ID is 1

It seems like we cannot access "mkl" threads using "omp_get_thread_num()". Is there any similar function for accessing thread IDs of mkl threads? Or is there a way to do that? (We need such information for affinity and thread placement decisions). 

Thank you very much, 

Sanaz 

0 Kudos
5 Replies
Ying_H_Intel
Employee
123 Views

Hi Sanaz,

As i understand the MD is 0  and MD is 1  are from #pragma omp parallel num_threads(2) and printf("My ID is %d\n", omp_get_thread_num()); reflect that. 

​But it should be ok to spawn 2 external OPENMP thread  and each of them spawn 16 MKL thread to implement MKL function.  for example, ensure envvars OMP_DYNAMIC=false and MKL_DYNAMIC=false to allow MKL thread in nested parallel regions).

​You may refer to MKL user guide, which have some discussion about this  or 

the article  https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application
and some  discussion in the forum like  : https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/296195

Best Regards,
Ying

 

 

Gheibi__Sanaz
Beginner
123 Views

Thank you very much Ying, 

The resources were very useful for setting the affinity of MKL threads. However, before trying to do the binding, we want to know which mkl threads execute each of the cblas_dgemm() functions. For example, using KMP_AFFINITY=verbose environment variable, we can observe that for example thread # 5 is bound to proc set{15}. But that doesn't give us much insight because we don't know what exactly this thread #5 is doing ( which of the cblas_dgemm() functions this thread is executing ). We will really appreciate your help regarding that. 

Best Regards, 

Sanaz 

Ying_H_Intel
Employee
123 Views

Hi Sanaz,

Right, you can't know what exactly thread is doing which of cblas_dgemm() function.  Or  you can't control every single mkl internal threads in openMP nested environment.  But  let's come back the original problem, you expected 2 task and  each task execute on half of your physical cpu cores, so get best performance. 

 As the paper mentioned,  you actually don't need to dive into every single mkl internal threads. the Linux os and KMP_AFFINITY can do that that for you.

No sure if you already did that by environment , your code seems miss one key code :  mkl_set_dynamic(0);

after add that, you may see expected performance and CPU usage.

NOTE
If your application uses OpenMP* threading, you may need to provide additional settings:
Set the environment variable
OMP_NESTED=TRUE, or alternatively call omp_set_nested(1), to
enable OpenMP nested parallelism.

Set the environment variable
MKL_DYNAMIC=FALSE, or alternatively call mkl_set_dynamic(0), to
prevent Intel MKL from dynamically reducing the number of OpenMP threads in nested parallel
regions.

I attached one for your reference.

Best Regards,

Ying


 

Ying_H_Intel
Employee
123 Views

Attach the file

 omp_set_nested(1);
   omp_set_max_active_levels(2);
   mkl_set_dynamic(0);
#pragma omp parallel num_threads(2)
 {

      if (omp_get_thread_num() == 0){

              mkl_set_num_threads(32);
             printf("My ID is %d \n", omp_get_thread_num());
              cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,m, n, p, 1, A, p, B, n, 0, C1, n);

      }else{
   

 

Thanks

Ying                                      

Gheibi__Sanaz
Beginner
123 Views

Thank you very much Ying. 

Reply