Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6974 Discussions

OMP set thread affinity mask fails with device not ready error

GopalRS
Beginner
1,978 Views

Hi, 

We are using MKL libraries in an C++ application running in a Spark environment on Windows. We use Spark to orchestrate the processing of a large dataset on multiple machines, and the C++ application is invoked to do the actual processing. All machines involved run Windows Server.

 

On some machines, the first #pragma omp parallel for fails with these messages: 

OMP: Error #134: Cannot set thread affinity mask.

OMP: System error #31: A device attached to the system is not functioning.

 

The same code runs perfectly in all other environments (for instance Windows desktops, Azure cloud machines running Windows). I also replaced the omp parallel for with a std::thread implementation and then the code failed in the next omp parallel for invocation. 

 

The MKL DLLs that we are using are from version 2019.0.4.1 and the libomp5md.dll is from version 5.0.2019.312

 

I realize that this amount of information is insufficient for someone on this forum to debug the issue, but I cannot get much else, and I'd appreciate if someone even points us in the right direction on how to think about this issue. For instance:

  • In the above error messages, is the "device" being referred the CPU?
  • What causes OMP to fail to set the thread affinity mask? The  C++ application runs with admin privileges, so it does have permission to set the affinity mask. 
  • We are not setting the affinity mask ourselves. The code calls mkl_set_num_threads and omp_set_num_threads following which the parallel for is executed. So my hunch is that setting thread affinity is being done internally by OMP. If that is the case, can we disable it? Will it hurt performance tremendously?
  • Is there any machine property (like CPU/chipset) that would cause the set thread affinity mask call to fail? 

 

Thanks!

 

0 Kudos
10 Replies
ShanmukhS_Intel
Moderator
1,948 Views

Hi,

 

Could you please let us know the Windows server details being used, so that we could check the support and guide you accordingly.

 

Best regards,

Shanmukh.SS

 

 

0 Kudos
GopalRS
Beginner
1,935 Views

Thanks Shanmukh for the quick response!

This is all the information I have: 

OS: Windows_NT
PROCESSOR_ARCHITECTURE: AMD64
PROCESSOR_IDENTIFIER:  Intel64 Family 6 Model 85 Stepping 7, GenuineIntel
PROCESSOR_LEVEL: 6
PROCESSOR_REVISION: 5507

 

I believe the server version would be greater than Windows Server 2016, most likely be 2019, and the datacenter edition. Due to security reasons, we cannot access these machines directly, and I don't have an easy way of pulling out the configs of these machines. 

 

Will this help? 

 

 

 

 

0 Kudos
ShanmukhS_Intel
Moderator
1,910 Views

Hi,


>>We are not setting the affinity mask ourselves. The code calls mkl_set_num_threads and omp_set_num_threads following which the parallel for is executed. So my hunch is that setting thread affinity is being done internally by OMP. If that is the case, can we disable it? Will it hurt performance tremendously?

>>OMP: Error #134: Cannot set thread affinity mask.

This issue can be worked around by setting environment variable KMP_AFFINITY=disabled and this may have performance implications.


Best Regards,

Shanmukh.SS


0 Kudos
ShanmukhS_Intel
Moderator
1,890 Views

Hi,


Reminder:

Has the information provided helped? Is your issue resolved? Could you please let us know if you need any other information.


Best Regards,

Shanmukh.SS


0 Kudos
GopalRS
Beginner
1,866 Views

 Hi Shanmukh,

 

Sorry for the delay. We haven't tried this switch yet, but we tried upgrading the OMP and MKL libraries to the latest versions. We are still encountering this error intermittently. 

I'm a bit hesitant to use the KMP_AFFINITY solution because our application is very performance intensive, and as you said, it may impact perf. 

 

I did have a question though: Can you please let us know when this error occurs? Specifically, why the library thinks that the device is not functioning? Is it a mismatched CPU? or OS or a permissions issue? 

 

Thanks!

0 Kudos
ShanmukhS_Intel
Moderator
1,722 Views

Hi Gopal,


>>Can you please let us know when this error occurs? Specifically, why the library thinks that the device is not functioning? Is it a mismatched CPU? or OS or a permissions issue? 

The issue is with respect to the CPU connected which might not be functioning properly as per the log mentioned by you.


Best Regards,

Shanmukh.SS


0 Kudos
GopalRS
Beginner
1,661 Views

Hi Shanmukh,

 

Just want to confirm that the "KMP_AFFINITY disable" solution that you provided did solve the problem. I am not seeing a significant performance hit either, but am running more experiments to be sure. 

I'm also following up with the service team regarding the bad CPU. Will update this post once I find something. 

 

Thanks for all the help!

Gopal.

 

0 Kudos
ShanmukhS_Intel
Moderator
1,641 Views

Hi Gopal,


>>Just want to confirm that the "KMP_AFFINITY disable" solution that you provided did solve the problem. I am not seeing a significant performance hit either, but am running more experiments to be sure. 


Thanks for the confirmation and sharing the details regarding the work arounds. Kindly let us know if we could close this thread at our end if this resolves your issue.


Best Regards,

Shanmukh.SS


0 Kudos
GopalRS
Beginner
1,628 Views

Yes, please go ahead

0 Kudos
ShanmukhS_Intel
Moderator
1,576 Views

Hi Gopal,


Thanks for the confirmation! Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Best Regards,

Shanmukh.SS


0 Kudos
Reply