running pardiso in parallel

gandalf85p · ‎05-18-2010

When I run PARDISO on a 4-processor node (that is part of a cluster), the solve stage (phase=33) takes the same amount of time as it would on 1 processor. In the code itself, I have these statements:

mkl_set_num_threads(4);

mkl_set_dynamic(false);

When I SSH into that node, and run the "top" command, the CPU usage isclose to 400%. If I use 1 processor, the CPU usage is close to 100%, but it takes the same amount of time. If I use 2 nodes, the solution is twice as fast. My matrix type is 6, and I'm compiling with these libraries:

-lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lguide -lmkl_solver

Why isn't PARDISO any faster with 4 processors? I'm guessing there's some setting or something I've missed. Thanks.

TimP · ‎05-18-2010

Particularly if your machine is HyperThreaded, it may be better to let MKL choose the number of threads. Depending on the type of machine, KMP_AFFINITY settings may be useful.

Gennady_F_Intel · ‎05-18-2010

this is because of this stage of calculation (phase 33) is not threaded.

More precisely, this stage of thesolution is threadedonly for the case of many right-hand sides.

--Gennady

Gennady_F_Intel · ‎05-18-2010

KMP_AFFINITYdoesn significantly affect the performance of the solver.Although every time to consider each particular case.

--Gennady

gandalf85p · ‎05-21-2010

OK, I see. So KMP_AFFINITY doesn't affect the solve stage? Not sure what you mean by "Although every time to consider each particular case."