I need to use cluster pardiso to solve a big double precision complex symmetric matrix. I set the iparm(40)=0, that means provide the matrix in usual centralized input format: the master MPI process stores all data from matrix A, with rank=0.
I use two compute nodes(each has 48gb ram). In the phase 33, solve and iterative refinement step, the master node (rank=0) uses 95% of the 48gb ram, the slave node (rank!=0) uses 73% of the 48gb ram.
I use four compute nodes to deal with the same matrix. In the phase 33, the master node uses 93% of the 48gb ram, each of the rest three nodes uses 70% of the 48gb ram.
Well, I already double the total ram, but the consumption of each individual node does not reduce much. Because my real matrix is much bigger than the test matrix, it always says out of physical memory. So I want to ask how to reduce memory consumption in each compute node?
Detail of the test matrix:
< Numerical Factorization with BLAS3 and O(n) synchronization >
< Linear system Ax = b >
number of equations: 40782
number of non-zeros in A: 7783421
number of non-zeros in A (%): 0.467987
number of right-hand sides: 18653
< Factors L and U >
number of columns for each panel: 64
number of independent subgraphs: 0
number of supernodes: 1152
size of largest supernode: 19339
number of non-zeros in L: 456143822
number of non-zeros in U: 1
number of non-zeros in L+U: 456143823
gflop for the numerical factorization: 25510.260441
I use mkl 11.2.3, mvapich2.0b, gcc 5.1, Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, InfiniBand: Mellanox Technologies MT25204 and linux86_64