Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- how to effectively reduce memory consumption of each compute node in cluster pardiso

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Chaowen_G_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2015
02:19 PM

7 Views

how to effectively reduce memory consumption of each compute node in cluster pardiso

Hi:

I need to use cluster pardiso to solve a big double precision complex symmetric matrix. I set the iparm(40)=0, that means provide the matrix in usual centralized input format: the master MPI process stores all data from matrix A, with rank=0.

First test:

I use two compute nodes(each has 48gb ram). In the phase 33, solve and iterative refinement step, the master node (rank=0) uses 95% of the 48gb ram, the slave node (rank!=0) uses 73% of the 48gb ram.

Second test:

I use four compute nodes to deal with the same matrix. In the phase 33, the master node uses 93% of the 48gb ram, each of the rest three nodes uses 70% of the 48gb ram.

Well, I already double the total ram, but the consumption of each individual node does not reduce much. Because my real matrix is much bigger than the test matrix, it always says out of physical memory. So I want to ask how to reduce memory consumption in each compute node?

Detail of the test matrix:

< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b >

number of equations: 40782

number of non-zeros in A: 7783421

number of non-zeros in A (%): 0.467987

number of right-hand sides: 18653

< Factors L and U >

number of columns for each panel: 64

number of independent subgraphs: 0

number of supernodes: 1152

size of largest supernode: 19339

number of non-zeros in L: 456143822

number of non-zeros in U: 1

number of non-zeros in L+U: 456143823

gflop for the numerical factorization: 25510.260441

I use mkl 11.2.3, mvapich2.0b, gcc 5.1, Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, InfiniBand: Mellanox Technologies MT25204 and linux86_64

1 Reply

Highlighted
##

Chaowen_G_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2015
02:35 PM

7 Views

Moreover, I use:

int provided;

MPI_Init_thread(nullptr,nullptr,MPI_THREAD_FUNNELED,&provided);

to initialize MPI

For more complete information about compiler optimizations, see our Optimization Notice.