Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7032 Discussions

Setting the number of threads in mkl on the mac version does not modify mkl_get_max_threads

AndreFecteau
Beginner
1,569 Views

The behaviour of mkl_set_num_threads(n) and mkl_get_max_threads() seems to be different on mac and linux. Here is a small code snippet to show the different behaviours and the output from mac and linux.

It seems that on mac the mkl_get_max_threads() return the number of physical cores on my machine unless the mkl_set_num_threads_local(n) function has been set. Where on Linux it returns the mkl_set_num_threads(n) if that one has been set.

Am I missing any settings that can affect the threading to not fall back on the global settings.

#include "mkl.h"
#include "iostream"

int main()
{
    mkl_verbose(1);
    MKLVersion Version;
    mkl_get_version(&Version);

    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Platform:                %s\n",Version.Platform);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");

    // Here just to show these environment variables are not set.
    if(const char* env_p = std::getenv("MKL_NUM_THREADS"))
        std::cout << "Your MKL_NUM_THREADS is: " << env_p << '\n';
    if(const char* env_p = std::getenv("OMP_NUM_THREADS"))
        std::cout << "Your OMP_NUM_THREADS is: " << env_p << '\n';
    if(const char* env_p = std::getenv("NUMEXPR_NUM_THREADS"))
        std::cout << "Your NUMEXPR_NUM_THREADS is: " << env_p << '\n';
    if(const char* env_p = std::getenv("VECLIB_MAXIMUM_THREADS"))
        std::cout << "Your VECLIB_MAXIMUM_THREADS is: " << env_p << '\n';
    if(const char* env_p = std::getenv("OPENBLAS_NUM_THREADS"))
        std::cout << "Your OPENBLAS_NUM_THREADS is: " << env_p << '\n';

    std::cout << mkl_get_dynamic() << std::endl;
    std::cout << mkl_get_max_threads() << std::endl;
    mkl_set_num_threads(4);
    std::cout << mkl_get_dynamic() << std::endl;
    std::cout << mkl_get_max_threads() << std::endl;
    mkl_set_num_threads_local(0);
    std::cout << mkl_get_dynamic() << std::endl;
    std::cout << mkl_get_max_threads() << std::endl;
    mkl_set_num_threads_local(4);
    std::cout << mkl_get_dynamic() << std::endl;
    std::cout << mkl_get_max_threads() << std::endl;
    mkl_set_num_threads_local(0);
    std::cout << mkl_get_dynamic() << std::endl;
    std::cout << mkl_get_max_threads() << std::endl;
}

 

Output from a mac on GCC or Apple clang version 13.1.6 (clang-1316.0.21.2.5)

Output from a macMajor version:           2023
Minor version:           0
Update version:          1
Product status:          Product
Build:                   20230303
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

1
8
1
8
1
8
1
4
1
8

 

Output from a linux gcc/icc/icx compilers

Major version:           2022
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20220804
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost), EVEX-encoded AES and Carry-Less Multiplication Quadword instructions
================================================================
1
1
1
4
1
4
1
4
1
4


Note: I did try the 2022 version on my mac and the same issue persist.
Note: Duplicated content from https://stackoverflow.com/questions/76176545/how-to-properly-define-the-maximum-thread-number-in-mkl-on-mac-computers but I either don't understand the reply or it is not relevant to the question.

0 Kudos
12 Replies
VarshaS_Intel
Moderator
1,544 Views

Hi,


Thanks for posting in Intel Communities.


Could you please let us know the MacOS details, hardware, and XCode version you are using?


Thanks & Regards,

Varsha


0 Kudos
AndreFecteau
Beginner
1,531 Views

Thanks for your reply, here are the details requested. Let me know if you need anything else.

  System Version:	macOS 12.4 (21F79)
  Kernel Version:	Darwin 21.5.0
  Boot Volume:	Macintosh HD
  Boot Mode:	Normal
  Computer Name:	*******
  User Name:	******
  Secure Virtual Memory:	Enabled
  System Integrity Protection:	Enabled
  Time since boot:	5 days 19:48
  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro16,1
  Processor Name:	8-Core Intel Core i9
  Processor Speed:	2.3 GHz
  Number of Processors:	1
  Total Number of Cores:	8
  L2 Cache (per Core):	256 KB
  L3 Cache:	16 MB
  Hyper-Threading Technology:	Enabled
  Memory:	16 GB
  System Firmware Version:	1731.120.10.0.0 (iBridge: 19.16.15071.0.0,0)
  OS Loader Version:	540.120.3~6
  Serial Number (system):	C02G43AZMD6N
  Hardware UUID:	085D1AE1-DA87-51AA-8CAA-11E9F7E68DD9
  Provisioning UDID:	085D1AE1-DA87-51AA-8CAA-11E9F7E68DD9
  Activation Lock Status:	Enabled
Xcode 13.4.1
Build version 13F100
0 Kudos
VarshaS_Intel
Moderator
1,466 Views

Hi,


We are working on your issue. We will get back to you soon.


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,451 Views

Hi,


Could you please let us know the threading you are using for both Linux and MacOS?

If possible, could you please provide us with the complete command you are using?


Thanks & Regards,

Varsha 


0 Kudos
AndreFecteau
Beginner
1,404 Views

I noticed after your comment that we utilize by default MKL with TBB on mac and MKL with openMP on linux by default.  The behaviour above is replicable on linux by itself. So the issue is that the behaviour is different from a TBB linked executable and an openMP linked executable.

Here are the 3 link lines for mkl.

Darwin TBB

set(MKL_LINK_LINE -L${MKLLIBDIR} -Wl,-rpath,${MKLLIBDIR} -lmkl_intel_lp64 -lmkl_tbb_thread -lmkl_core -L${INTELROOT} -Wl,-rpath,${INTELROOT} -lpthread -lm -ldl)


Linux TBB 

set(MKL_LINK_LINE -L${MKLLIBDIR} -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_tbb_thread -lmkl_core -lpthread -lm -ldl)


Linux OpenMP

set(MKL_LINK_LINE -L${MKLLIBDIR} -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lpthread -lm -ldl)

 

0 Kudos
VarshaS_Intel
Moderator
1,368 Views

Hi,


We are working on your issue internally, we will get back to you soon.


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,314 Views

Hi,


Thanks for your patience.


The issue raised by you is being looked at by the development team. We will update you once the issue is fixed.


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,207 Views

Hi,


Thanks for your patience and Apologies for the delay in the response.


Yes,  All the mkl_set_num_threads(n) and mkl_get_max_threads() APIs are expected to provide correct values for OpenMP threading only.


It might be, that if you link the MKL and TBB thread library, then oneTBB controls the number of threads - not Intel oneMKL. So, it is not recommended to use the MKL Threading APIs instead you can use the tbb::schedule_task class.


For more details regarding threading controls, please refer to the below link:

https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2023-2/threading-control.html


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
1,176 Views

Hi,


We have not heard back from you. Could you please let us know if you have any other queries?


Thanks & Regards,

Varsha


0 Kudos
AndreFecteau
Beginner
1,161 Views

Thanks for the reply,

We currently support both tbb and openmp versions of MKL since the openmp version for us is more performant on most platforms for our application and apple clang does not have support of openmp.

During execution we toggle from running one instance Paradiso (which uses tbb or openmp) with multiple core to running a parallel for loop that contains Paradiso running in serial mode. (Or we assumed that we ran paradiso in serial after calling mkl_set_num_threads(1))

We will be evaluating the behaviour more closely and how to replicate this toggling of Paradiso in serial with tbb in parallel when we have the bandwidth to do so.

0 Kudos
VarshaS_Intel
Moderator
1,067 Views

Hi,


We have not heard back from you. Could you please let us know if you had some bandwidth and tried evaluating the behavior?


Thanks & Regards,

Varsha


0 Kudos
VarshaS_Intel
Moderator
998 Views


Hi,


We have not heard back from you. Could you please let us know if you had some bandwidth and tried evaluating the behavior?


Thanks & Regards,

Varsha


0 Kudos
Reply