Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7284 Discussions

How to Ensure Full MKL Initialization and Memory Allocation Before mlockall in Real-Time Linux Appli

pranjals
Beginner
529 Views

Hello,

I am developing a real-time application on Linux that uses Intel MKL (oneMKL) for BLAS/LAPACK routines. To avoid page faults on real-time threads, I use mlockall(MCL_CURRENT) to lock all current memory pages before starting time-critical computation.

However, I have observed that MKL performs lazy initialization and allocates internal memory pages on the first BLAS call. If this happens after mlockall, those pages are not locked, which can cause page faults and real-time deadline misses.

I have tried the common workaround of calling a dummy BLAS function (such as a 1x1 dgemm) before mlockall to force MKL initialization and memory allocation. Despite this, I still observe that some MKL-related allocations or initializations may occur later, after mlockall, leading to page faults.

Questions:

  1. Is there an official or recommended way to force MKL to perform all internal initialization and memory allocation at program startup, before calling mlockall?
  2. Are there any environment variables, API calls, or linker options that can guarantee all MKL memory is allocated up front?
  3. Is the dummy BLAS call method (e.g., 1x1 dgemm) a recommended practice, or is there a more robust solution for real-time systems?

Thank you for your help.

 
0 Kudos
1 Reply
Spencer_P_Intel
Employee
379 Views

Hi pranjals,

 

This is a great question and unfortunately the general answer is not what you were hoping for, I think.

 

The Intel(R) oneMKL library is not set up to manage or limit use of temporary memory internally.  As a performance library, when necessary, we utilize temporary memory for some algorithms which is allocated and deallocated (or cached using memory manager) on the fly.  It is possible that some APIs might not have these issues, but they are neither named nor categorized in the library as such and it could change at any release, depending on optimizations and algorithms implemented for the hardware architecture being supported.  The allocation sizes also often depend on the problem sizes which rules out the dummy calls approach in general.  

 

Best Regards,

Spencer

0 Kudos
Reply