OOC Pardiso = "Diso"

brianlamm · ‎11-30-2008

First, the title of this thread means out of core Pardiso is not parallel (threaded), no surprise there since page 2359 of Ref manual reads in a warning OOC Pardiso is not threaded.

My problem begins with I really don't understand the difference between the sequential and the threaded libraries of MKL in the threading layer of the "layered model". My clue that I don't understand is in table 5.2 of the User's Guide, under the column "Application Threaded?", in every case except one where that column entry is "no" the MKL threading library called out to use is "mkl_intel_thread.lib" (static case). I suppose it has something additional to do with the RTL layer recommended. On the surface, that is looking only at the name of the MKL threading library, I would think one would use the "sequential" threading library if the application is not threaded, but the reverse is specified in table 5.2. A bit more explanation there would be nice.

However, my real problem is understanding what the implications are for threading where iparm(60) =1 in Pardiso. Since that value implies either in-core or OOC Pardiso could be used, what threading library in the threading layer should (could) I use?

It seems to me (based on my obviously flawed understanding of the threading layer of MKL) iparm(60)=1 is useless since one would have to know at compile time if there is enough memory at runtime to use in-core Paridso, which means using mkl_intel_thread.lib (static case) for compile, and if there is not enough memory at runtime then one would have had to compile with mkl_intel_sequential.lib (static case). You see the problem (my problem).

So my questions seems to boil down to this: If the sequential (OOC) Pardiso is used at runtime, can I still compile with mkl_intel_thread.lib (static), or mkl_intel_thread_dll.lib (dynamic)? If the answer is "no", then one would have to supply two different implemenations using Padiso, say, in two dlls, create a driver which at run time (somehow) determines the currently available RAM, and chooses either OOC or in-core Pardiso dll. But that too would seem impossible since I believe the exe itself would have had to have been compiled with either the parallel or sequential mkl threading library.

So, I hope the answer is "yes", you can compile Pardiso, with iparm(60)=1, with mkl_intel_thread.lib or mkl_intel_thread_dll.lib, and expect correct results (all "other things" being equal).

-Brian L.

Sergey_Solovev__Inte · ‎12-02-2008

Hello, Brian,

You can link test with iparm(60)=1 with both mkl_intel_thread.lib and mkl_intel_sequential.lib and mkl_intel_thread_dll.lib and mkl_intel_ sequential_dll.lib. The OOC PARDISO uses approx. same memory size for each case. Anyway, for all cases you should set MKL_PARDISO_OOC_MAX_CORE_SIZE = 1.

Sergey_Solovev__Inte · ‎12-02-2008

Brain, sorry for typo,

I mean MKL_NUM_THREADS=1 ( not MKL_PARDISO_OOC_MAX_CORE_SIZE = 1)

brianlamm · ‎12-02-2008

Quoting - Sergey Solovev (Intel)

Brain, sorry for typo,

I mean MKL_NUM_THREADS=1 ( not MKL_PARDISO_OOC_MAX_CORE_SIZE = 1)

Sergey,

Thanks for the informative reply.

However, having to set MKL_NUM_THREAS=1 defeats the purpose of being able to use more than one processor if enough RAM is available for Pardiso to run in-core memory. That is, there seems to be no reliable way to make Pardiso execute on all available processors in case it's peak memory usage is not more than available RAM. It seems as if the developer using MKL has to "cross his/her fingers" where they want Pardiso to run on all available processors ("CALL MKL_SET_NUM_THREADS(MKL_GET_NUM_THREADS)"), and therefore has to subsequently set iparm(60)=0 if call returns more than one, and hope nothing is "paged out" due to lack of main memory (ugh).

I don't want to have to cross my fingers, and I don't want to have to build different versions of app using Pardiso based on target machine's number of processors and RAM. Besides, there's just no way to determine how much RAM will be available ahead of time, even users with 32GB RAM machines might be multi-tasking when app using Pardiso hits the highway.

So, it seems as if a few things need to be made available: I cannot find in docs if Pardiso is capable during, say, analysis phase, of determining or at least estimating how much memory it will use. If not, then I believe there is no solution to having my cake and eating it too. You can see if this capability was available, then developer only has to determine, at runtime, how much RAM is currently available, and CALL MKL_SET_NUM_THREADS(MKL_GET_NUM_THREADS) and also set iparm(60)=0 if enough RAM is available, or if not enough RAM available CALL MKL_SET_NUM_THREADS(1) and set iparm(60)=2 (and built with mkl_intel_thread*), and get correct results (all other things being equal).

So I now ask: if iparm(60)=1, MKL_NUM_THREADS > 1, enough RAM available, and built with mkl_intel_thread*, will Pardiso give correct results (as usual, all "other things being equal"), or will it give me an ierror not equal to zero, or (worse) incorrect results?

-Brian

Sergey_K_Intel1 · ‎12-02-2008

Brian,

A rough estimate for the total double precision memory consumption in Kbytes for factorization and solver steps can be computed after the reordering and symbolic factorization step according to the formulae: max(iparm(15), iparm(16) + iparm(18)*8/1024). So the user is capable to estimate how much memory is needed for factorization and solver step. The peak memory consumption for factorization and solver step might be a little bit higher depending on input parameters. For examples, it will be higher, for example, in the case of the usage of built-in CG/PCG solver. The exact number for the total double precision memory consumption in Kbytes is provided with iparm(17) which is computed in phase 2. Iparm(61) and Iparm(62) are similar to iparm(16) and iparm(17) respectively and the parameters are used by out-of-core. These out-of-core parameters report memory consumption in Mbytes,

Setting MKL_NUM_THREAD =1 is only required for early versions of OOC like MKL 10.

In MKL 10.1, the user can set any value of MKL_NUM_THREADS and everything must work. Moreover PARDISO OOC will use OpenMP threading available through BLAS and LAPACK.

So if iparm(60)=1, MKL_NUM_THREADS > 1, enough RAM available, and built with mkl_intel_thread form MKL 10.1, PARDISO will give the correct result and PARDISO will use OpenMP threading available through BLAS and LAPACK routines.

All the best

Sergey

brianlamm · ‎12-06-2008

Wow. I mean WOW! That is great, great news!

Once again, thanks for the informative, in depth, and clearreply.

-Brian