Solved: OneAPI PARDISO iparm[62]=0

morskaya_svinka_1 · ‎10-09-2024

I use MKL 2024.2, and I tested analysis phase results for several positive definite matrices. I noticed the following tendency with iparm[62] output parameter: it is nonzero for some small matrices, then I continue to generate the matrices of higher size and more nnz and it increases, but at some point begins to decrease down to zero. Is iparm[62] supposed to be 0 at all? According to what is written in the documentation, in case of iparm[62] = 0 PARDISO does not need any memory for internal floating-point arrays during factorization and solve phases, which does not make sense and looks like a bug. It might be that in case of iparm[14] > iparm[15] + iparm[62] the parameter iparm[62] can be set to zero. It might be also that iparm[14] can provide all the memory required by OOC. But with the current iparm[14], iparm[15] and iparm[62] description there is the contradiction, I thought there was a mistake in documentation, but in the documentation for fortran interface there is the same description of iparm(63), iparm(15) and iparm(16), and fortran code returned the same non-zero iparm(63) value in an example program as C code. Please tell me whether this is a bug and whether I can get the PARDISO estimation of peak memory consumption during factoriztion and solve phases in OOC mode (maybe there is a parameter that was supposed to be iparm[62] but the people who wrote documentation mixed it up).
Results:
n = 1918, nnz = 52594, iparm[14] = 3064, iparm[15] = 2071, iparm[62] = 1512
n = 3907, nnz = 111403, iparm[14] = 6272, iparm[15] = 4189, iparm[62] = 2291
n = 7084, nnz = 207544, iparm[14] = 11511, iparm[15] = 7633, iparm[62] = 3340
n = 25764, nnz = 786516, iparm[14] = 43215, iparm[15] = 28435, iparm[62] = 3634
n = 57395, nnz = 1784387, iparm[14] = 98177, iparm[15] = 64330, iparm[62] = 562
n = 74284, nnz = 2321272, iparm[14] = 127835, iparm[15] = 83649, iparm[62] = 0
n = 119729, nnz = 3771857, iparm[14] = 208240, iparm[15] = 135975, iparm[62] = 0
n = 213433, nnz = 6774193, iparm[14] = 376234, iparm[15] = 245177, iparm[62] = 0

I also attached some results for another set of matrices, and for all the tests iparm[62] is 0. I saw that the peak of process working set happens somewhere at the end of the factorization, not during the analysis phase, so the estimate iparm[15] + iparm[62] does not work for these tests.

Ruqiu_C_Intel · ‎12-08-2024

Thank you for reaching us. This issue is closing and we will no longer monitor this thread. If you require additional assistance from Intel, please start a new thread.

View solution in original post

Ruqiu_C_Intel · ‎10-31-2024

Hello,

Thank you for posting.

The documentation mentions max(iparm[14]+iparm[15]+iparm[62]) is the peak memory for factorization and solve phase. Theoretically, it's possible that iparm[62] = 0. Because phase 2 (factorization) will reuse the memory used in phase 1. The parameter iparm[62] only shows the amount of extra memory that is required for factorization. This means if during phase 1 we require/allocate larger memory than the one required for factorization, then iparm[62] can be 0. From the provided data, we can see the memory required by phase 1 increases dramatically for larger problem sizes. So, this is likely to be the case.

We noticed you also posted another PARDISO OOC memory topic https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-PARDISO-strange-performance-in-out-of-core-mode/m-p/1636210, can we update via this thread?

Based on your data, we shared the comments above. For better investigating the questions, it's helpful if you can provide us the reproducer.

Regards,

Ruqiu

morskaya_svinka_1 · ‎11-02-2024

OK, let us update via this thread. I see, this explanation makes sense. My point on iparm[62] documentation is it requires to be clarified like "extra memory required for OOC mode on top of permanent memory reported in iparm[15]" instead of "Size of minimum OOC memory for numerical factorization and solution". It will be helpful to know what exactly is stored in symbolic factorization permanent memory (iparm[15]) and which way it can be reused (I thought first permanent memory cannot be reused as it stores the indices arrays for factors).

Secondly, the issue is real memory consumption on Windows 10 grows if MKL_PARDISO_OOC_MAX_CORE_SIZE grows, but iparm[62] remains zero and iparm[14] and iparm[15] remain constant. this can be seen from the results provided in the 1st message in that topic https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-PARDISO-strange-performance-in-out-of-core-mode/m-p/1636210. So this estimate max(iparm[14], iparm[15] + iparm[62]) does not work.
The third issue is when you cross in-core mode minimal RAM estimate in OOC mode, the solver slows down for about 1.5 times. I do not really want to provide the matrices I used for testing as they are big. So I provide an example: source code and a small matrix. When I set MKL_PARDISO_OOC_MAX_CORE_SIZE=682, total time is 3.55 sec, but when I set MKL_PARDISO_OOC_MAX_CORE_SIZE=683, total time is 5.15 sec. Stats of analysis phase are as follows: iparm[14] = 111359, iparm[15] = 97038, iparm[16] = 601369, iparm[62] = 5512.
I tested this on matrices that occurs from finite element method, the issue occurs on 3D problems matrices, but on 2D matrices I didnt see this effect.

Ruqiu_C_Intel · ‎11-06-2024

Hello,

Thank you for pointing out the incorrect description for the document. We will update them in the future release.

Also, could you tell us how you measured the peak working set size?

For the second question, max(iparm[14], iparm[15]+ iparm[62]) is the minimum size of RAM memory needed for in-core PARDISO to solve the system. It is not the actual memory consumption, but minimum requirement for OOC to work. So theoretically PARDISO is allowed using about MKL_PARDISO_OOC_MAX_CORE_SIZE of RAM memory and max(iparm[14], iparm[15]+iparm[62]) - MKL_PARDISO_OOC_MAX_CORE_SIZE of virtual memory if max(iparm[14], iparm[15]+iparm[62]) > MKL_PARDISO_OOC_MAX_CORE_SIZE.

For the third question, when 'MKL_PARDISO_OOC_MAX_CORE_SIZE > max(iparm[14], iparm[15]+iparm[16])', PARDISO switches internally to a different OOC algorithm if have sufficient memory required to keep complete LU factors in RAM. The switched algorithm optimizes for cases with multiple RHS that do not fit in RAM. For your case, it would run faster with IC mode. Or PARDISO will automatically do it if iparm[59] is set to 1 (rather than 2).

Also please be aware of that OOC mode in general is tuned for very large matrix sizes that do not fit in memory. Please provide us a reproducer if you observe performance degradation for very large matrices that do not fit in memory, then we can further investigate it.

morskaya_svinka_1 · ‎11-08-2024

Answering your question about measuring PeakWorkingSet parameter. I used .NET C# System.Diagnostics.Process api on Windows to start a process and then ask operating system some information about this process at some moments. Here is the application example: https://github.com/ChessMastery/WindowsMemoryAnalyzer. It is actually a very convenient tool as I did not find a good memory consumption analyzer on Windows, for example memory consumption analysis is missing in Intel Vtune profiler and Intel Advisor.
1. In order to make it clear, is it correct that in case of iparm[62]=0 MKL PARDISO factorization and solve_phase reuse some memory from iparm[15] amount (permanent memory on analysis phase) and not from extra amount iparm[14] - iparm[15] (extra memory on analysis phase)?
2. "max(iparm[14], iparm[15]+ iparm[62]) is the minimum size of RAM memory needed for in-core" - I assume you meant "needed for out-of-core phase=13". OK, I see iparm[15] + iparm[62] is minimal memory requirement for OOC phase=23 to work. But the documentation statement in iparm[62] description says "Total peak memory consumption of OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO can be estimated asmax(iparm[14], iparm[15] + iparm[62])." When the user reads this statement out-of-context, he thinks max(iparm[14], iparm[15] + iparm[62]) is the upper bound of memory consumption for phase=13. I think it should be changed for something like "Total peak memory consumption of OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO in case of minimal memory usage settings provided can be estimated as max(iparm[14], iparm[15] + iparm[62])."
3. I guess when MKL PARDISO needs to decide whether it has enough memory to run OOC (iparm[59] is set to 2) phase=23 it checks iparm[15] + iparm[62] < MKL_PARDISO_OOC_MAX_CORE_SIZE. What exactly does MKL PARDISO check in case of phase=22 and 33 when a) iparm[59]=1 and b) iparm[59]=2? Can I find out the results of these checks using iparm[15] and iparm[62] output before the call of the procedures in order to correct MKL_PARDISO_OOC_MAX_CORE_SIZE value?
4. Is MKL_PARDISO_OOC_MAX_SWAP_SIZE variable used only to switch between IC and OOC when iparm[59] is set to 1? Is it correct that MKL_PARDISO_OOC_MAX_SWAP_SIZE has nothing to do with restriction of swap file space usage controlled by the application?

Ruqiu_C_Intel · ‎11-21-2024

Hello Morskaya,

Please check our comments below:

1. In order to make it clear, is it correct that in case of iparm[62]=0 MKL PARDISO factorization and solve_phase reuse some memory from iparm[15] amount (permanent memory on analysis phase) and not from extra amount iparm[14] - iparm[15] (extra memory on analysis phase)?

--> PARDISO can reuse also from iparm[14]-iparm[15]. Reuse does not strictly mean use the values but the allocation space. iparm[62] just shows the extra memory required and max(iparm[14], iparm[15]+iparm[62]) shows the minimum required memory for OOC to work.

2. "max(iparm[14], iparm[15]+ iparm[62]) is the minimum size of RAM memory needed for in-core" - I assume you meant "needed for out-of-core phase=13". OK, I see iparm[15] + iparm[62] is minimal memory requirement for OOC phase=23 to work. But the documentation statement in iparm[62] description says "Total peak memory consumption of OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO can be estimated asmax(iparm[14], iparm[15] + iparm[62])." When the user reads this statement out-of-context, he thinks max(iparm[14], iparm[15] + iparm[62]) is the upper bound of memory consumption for phase=13. I think it should be changed for something like "Total peak memory consumption of OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO in case of minimal memory usage settings provided can be estimated as max(iparm[14], iparm[15] + iparm[62])."

--> Thank you for raising the concerns. The document will be updated in new feature release.

3. I guess when MKL PARDISO needs to decide whether it has enough memory to run OOC (iparm[59] is set to 2) phase=23 it checks iparm[15] + iparm[62] < MKL_PARDISO_OOC_MAX_CORE_SIZE. What exactly does MKL PARDISO check in case of phase=22 and 33 when a) iparm[59]=1 and b) iparm[59]=2? Can I find out the results of these checks using iparm[15] and iparm[62] output before the call of the procedures in order to correct MKL_PARDISO_OOC_MAX_CORE_SIZE value?

--> In OOC mode, PARDISO checks if MKL_PARDISO_OOC_MAX_CORE_SIZE is greater than or equal to the minimum amount of memory required by OOC mode, i.e., max(iparm[14], iparm[15]+iparm[62]). If this criterion is not satisfied, PARDISO will return an error.

In case of IC/OOC hybrid mode (iparm[59]=1), PARDISO will check if the memory required by IC mode is less than total available memory (MKL_PARDISO_OOC_MAX_CORE_SIZE+MKL_PARDISO_OOC_MAX_SWAP_SIZE). If not PARDISO will choose OOC mode.

Actually, the value of MKL_PARDISO_OOC_MAX_CORE_SIZE should be close to the RAM size; see https://www.intel.com/content/www/us/en/developer/articles/training/how-to-use-ooc-pardiso.html .So it depends on the hardware.

4. Is MKL_PARDISO_OOC_MAX_SWAP_SIZE variable used only to switch between IC and OOC when iparm[59] is set to 1? Is it correct that MKL_PARDISO_OOC_MAX_SWAP_SIZE has nothing to do with restriction of swap file space usage controlled by the application?

--> Yes, MKL_PARDISO_OOC_MAX_SWAP_SIZE is only used to switch between IC and OOC. Note that PARDISO does not control the actual swap size, but it is left to the user. The variable just informs PARDISO the swap size set by the user.

Regards,

Ruqiu

Ruqiu_C_Intel · ‎12-08-2024

Thank you for reaching us. This issue is closing and we will no longer monitor this thread. If you require additional assistance from Intel, please start a new thread.