In 32bit I do this and it seems to work OK. Never using Windows virtual memory and not using OOC mode unless the problem is very big:
- Set MKL_PARDISO_OOC_MAX_CORE_SIZE to 1000
- Call pardiso with iparm=1
- If it returns error -2 then reduce MKL_PARDISO_OOC_MAX_CORE_SIZE by 100 and keep trying again until it succeeds or fails with error -9.
But in 64 bit with plenty of RAM and an initial MKL_PARDISO_OOC_MAX_CORE_SIZE of 16000, that method results in it using OOC mode even for relatively small problems where it's not needed and most of the available RAM goes unused. Worse, with 64 bit and not much RAM, large problems don't return any error code but go ahead and solve using in-core mode and Windows virtual memory, which is very very slow.
How can I make it work in a reasonable way regardless of the amount of RAM or problem size? The biggest issue is making sure it doesn't use Windows' virtual memory.
It looks like MKL_PARDISO_OOC_MAX_CORE_SIZE still more than available RAM (for your last described case). Note that PARDISO can actually use more memory than the MKL_PARDISO_OOC_MAX_CORE_SIZE. So, please, reduce the initial MKL_PARDISO_OOC_MAX_CORE_SIZE in step 1 to the 90-80% of real available size of RAM.
And one question for clarity, do you solve the problem with one right-hand side, or many?
So my general process is OK? Start high then reduce it until it succeeds, not start low and increase it like the manual implies? Actually starting too low seems to crash it in various ways rather than just returning error code -9
I did see it working well when I started with slightly less than the available RAM. That leads to a new problem of finding out the available RAM. Do I really have to do that myself? Pardiso doesn't do any kind of test to find out if it's using RAM or virtual memory?
I'm using multiple right hand sides but only 1 for testing and that's where the problems are.
Searching memory process is not needed.
1) First, you can estimate all of the required minimum size of RAM as max(iparm(15), iparm(16) + iparm(63)) (as it described in pardiso iparm Parameter table). If required minimum size of RAM is less than available RAM, it is possible to use OOC PARDISO for solving your problem.
2) PARDISO has no mechanism to estimate your available memory. Therefore, you should provide it by OOC_MAX_CORE_SIZE variable.
3) In part 1) PARDISO estimates the required minimum size of RAM for a problem with one RHS. In case of many RHs, PARDISO needs a lot of additional memory on iterative refinement step – approx. n*nrhs*2*16 bytes for sequential mode. PARDISO automatically performs two steps of iterative refinement when perturbed pivots are obtained during the numerical factorization (see description of iparm and iparm in pardiso iparm Parametr table). In case of many RHSs PARDISO can not provide the minimum size of all of the required memory at once for many reasons. For example, you can offer any number of RHS on the solving step.
If you have many RHS, you can use the following workaround. First, perform phase 12. Then, on solving step, split RHSs into small chunks and find corresponding solutions by a cycle (see attached pic.).
Great. Thanks. I was about to try this kind of way. I just didn't think it was necessary because iparm=1 had seemed to work very well in 32 bit mode but goes wrong in 64 bit. It sound like it will also work around this bug:
Strange results with that method. With one test problem (300,000 rows, 1 rhs), I see these memory usages:
- Windows task manager graph: ~2.4GB
- iparm after phase 1: 17975 (18GB)
- iparm after phase 1: 2,300,000 (2.3GB)
- Windows task manager graph: ~2.6GB
- iparm after phase 1: 8920 (9GB*)
- iparm after phase 1: 2,300,000 (2.3GB)
It looks like iparm is the only one that gives a reasonable result but the 10.2.3 manual says it's computed in phase 2 so I don't think it's safe to use after phase 1 although the 11.1 manual says it's computed in phase 1. Could the 10.2.3 manual be in error?
Any idea which value is most reliable?
*Update: I see the meaning of iparm changed between 10.2 and 11.1. The result from 10.2 is probably wrong because of a bug and the result from 11.1 is in kB, not MB but is only for OOC mode so isn't appropriate.
Do you have a reference for the additional n*nrhs*2*16 bytes? Does that mean the total in-core memory requirement in both mkl 10.2.3 and 11.1 is approximately
1000 * max(iparm, iparm+iparm) + n*nrhs*32
I only have a few (typically 1-5) rhs's so I think it's ok to just estimate this part since it's much smaller than the memory used by the numerical factorization.
Results are not strange. I am confused by iparm in your last message. Iparm is used for In-core PARDISO. Am I right that you want to use OOC PARDISO without swapping files (using virtual memory)?
Let’s talk about MKL 11.1 because MKL 10.2.3 is quite old version.
1) Just in case, if you use zero based indexing (C/C++) iparm(k) should be iparm[k-1].
2) If iparm(63) = 8920, it means that OOC PARDISO needs minimum 9 MB of RAM on numerical factorization and solution phases, that is OOC_MAX_CORE_SIZE should be more than 9 at least.
3) 1024 * max(iparm(15), iparm(16) + iparm(63)) + n*nrhs*32 is the minimum size of needed RAM in bytes for OOC PARDISO
4) 1024 * max(iparm(15), iparm(16)+iparm(17)) + n*nrhs*32 is the total peak of used memory in bytes for In-core PARDISO.
5) There is no reference about n*nrhs*32. It is not documented.
6) You did not write about iparm(16), iparm (15) and OOC_MAX_CORE_SIZE.
I can say only that if you have less than 2,4 GB available (free) RAM, but OOC_MAX_CORE_SIZE is more than 2400 and OOC PARDISO is used then swapping files will be happened.
If you use In-core PARDISO and available RAM is less than 2,4 GB then swapping files will be happened irrespective of the any value of OOC_MAX_CORE_SIZE.
I think I've got it worked out now. I just want the "best" or even "reasonable" performance. That means it should use IC if there's enough RAM and OOC if there isn't.
I'm using your step 4 to decide if there's enough RAM for in-core, then calling pardiso with iparm(60)=0 or 2 accordingly. I don't use iparm(60)=1 at all because I can't get it to work for different reasons in each version of MKL. It's easier to just have my sofware make the decision.
iparm(15) and (16) give relatively small numbers so I'm not too worried about them.