There are some very impressive memory vs mpi process plots in the excellent mkl presentation:
but its a little confusing what the memory requirements are, is the original matrix needed on each node? Sounds like it is from :
"The algorithm ensures that the memory required to keep internal data on each MPI process is decreased when the number of MPI processes in a run increases. However, the solver requires that matrix A and some other internal arrays completely fit into the memory of each MPI process."
Any thoughts appreciated, thanks!
You are right, in presentation which you mentioned initial matrix stored on master process. However in last releases of MKL we implemented distributed reordering ( iparm = 10 ) which doesn't combine matrix on one processor.