- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am developing a real-time application on Linux that uses Intel MKL (oneMKL) for BLAS/LAPACK routines. To avoid page faults on real-time threads, I use mlockall(MCL_CURRENT) to lock all current memory pages before starting time-critical computation.
However, I have observed that MKL performs lazy initialization and allocates internal memory pages on the first BLAS call. If this happens after mlockall, those pages are not locked, which can cause page faults and real-time deadline misses.
I have tried the common workaround of calling a dummy BLAS function (such as a 1x1 dgemm) before mlockall to force MKL initialization and memory allocation. Despite this, I still observe that some MKL-related allocations or initializations may occur later, after mlockall, leading to page faults.
Questions:
- Is there an official or recommended way to force MKL to perform all internal initialization and memory allocation at program startup, before calling mlockall?
- Are there any environment variables, API calls, or linker options that can guarantee all MKL memory is allocated up front?
- Is the dummy BLAS call method (e.g., 1x1 dgemm) a recommended practice, or is there a more robust solution for real-time systems?
Thank you for your help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi pranjals,
This is a great question and unfortunately the general answer is not what you were hoping for, I think.
The Intel(R) oneMKL library is not set up to manage or limit use of temporary memory internally. As a performance library, when necessary, we utilize temporary memory for some algorithms which is allocated and deallocated (or cached using memory manager) on the fly. It is possible that some APIs might not have these issues, but they are neither named nor categorized in the library as such and it could change at any release, depending on optimizations and algorithms implemented for the hardware architecture being supported. The allocation sizes also often depend on the problem sizes which rules out the dummy calls approach in general.
Best Regards,
Spencer
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page