Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7130 Discussions

OneAPI PARDISO strange performance in out-of-core mode

morskaya_svinka_1
New Contributor I
955 Views

I tested MKL PARDISO (multithreaded version) out-of-core mode on some real positive-definite systems (mtype=2). Platform characteristics: 11th Gen Intel(R) Core(TM) i5-11400 @ 2,60GHz 2,95 GHz, 64GB RAM, Windows 10.  I use all the default parameters except iparm[1]=3 (parMetis ordering). I set MKL_PARDISO_OOC_MAX_SWAP_SIZE=0 and different values for MKL_PARDISO_OOC_MAX_CORE_SIZE. I have about 55GB free RAM when I start the solver and no additional CPU load. The problem is the more memory I provide, the slower the solver works. It also increases PeakWorkingSet, so the more RAM is given to the solver, the more RAM the solver actually comsumes. All the problems I've tested fit in RAM, so I compared the results with in-core mode as well. I provide a PDF file with statistics (and I also provide the same file in docx format for windows users if they want to copy the tables to excel or smth). Please explain whether such behaviour is normal for MKL PARDISO and what its causes are. According to my tests, when OOC is used (iparm[59]=2) but the memory restriction provided with MKL_PARDISO_OOC_MAX_CORE_SIZE is larger than minimal IC RAM estimate, something strange happens, and I want to figure out what exactly slows down the solver that much. I tried setting MKL_PARDISO_OOC_MAX_SWAP_SIZE=8000 (my swap file size is about 10GB), but the results are the same.
The reason I am interested in this question is I try to find a good strategy for memory management in out-of-core, like get out-of-core minimum RAM estimation as max(iparm[14], iparm[15] + iparm[62]) and provide 200% of it. I can test the solver with some matrices form SuiteSparseCollection if necessary, in order to make the results reproducible. I also plan to test out-of-core mode behaviour on some problems that do not fit in RAM.

0 Kudos
4 Replies
morskaya_svinka_1
New Contributor I
869 Views

I've added some more tests (24, 40, 45, 50 GB) and it looks like my assumption was correct: when OOC mode is used explicitly (iparm[59]=2) and  MKL_PARDISO_OOC_MAX_CORE_SIZE is greater than iparm[15]+iparm[16], MKL PARDISO OOC mode slows down up to x2 compared with MKL_PARDISO_OOC_MAX_CORE_SIZE less than iparm[15]+iparm[16].

0 Kudos
morskaya_svinka_1
New Contributor I
847 Views

It can be seen as well in the results provided that MKL PARDISO increases memory consumption in OOC mode (PeakWorkingSet) as MKL_PARDISO_OOC_MAX_CORE_SIZE increases, but the CPU time does not improve (it even slows down a bit by 5%). I thought first the more memory I provide, the faster OOC mode should work, but since CPU time is the same (except problems 2 and 3, where it increases once when I set 8GB instead of minimal 1,5GB and 2GB respectively) I've got an assumption it might be that working array for PARDISO is increased and accessed (e. g. initialized with zeros), but still constant size of it is really used whatever MKL_PARDISO_OOC_MAX_CORE_SIZE value you provide (after some point, as we can see on problems 2 and 3, which require more space than minimal RAM estimate * 110% to reach maximum speed in OOC, but after they reach it, we can still see that PeakWorkingSet grows and CPU time does not improve).  So it can be an issue to work on.

0 Kudos
morskaya_svinka_1
New Contributor I
736 Views

So the graph cpu_time(ram_restriction) looks somewhat like this on 64GB RAM machine.

0 Kudos
Ruqiu_C_Intel
Moderator
494 Views

We are watching and update the similar topic through this thread: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/OneAPI-PARDISO-iparm-62-0/m-p/1636246

0 Kudos
Reply