I want to diagonalize a large matrix, which size is about 40000*40000.
Our supercomputer has 80 nodes and there are two cpus in each node with eight-core.
I think it is very hard to diagonalize such a large matrix just using multithread optimal lapack program in MKL, so I plan to employ the scalapack program.
I understand that the scalapack in MKL can make use both the multithread and multiprocess power to speed up diagonalization, is it correct?
Would you please give me some advice about how many nodes and how many cores in each node I should use?
What is the appropriate block size Mb and Nb for the problem?
Just a few questions please. Why do you need this functionality? Usually it’s needed for Eigensolver problem. Which type of matrix do you have (symmetrical/unsymmatrical)? So, which routines would you like to use? This information will help to give better answers.
The best way is to make a few experiments in different modes. Pure MPI version, a few OMP threads, different NBs etc. As I know, multithreading is not very efficient for such type of routines in ScaLAPACK. Usually NB in range 32-128 is a good choice. And another suggestion is to compare cluster results with single node run with LAPACK - the level of optimizations in LAPACK is higher at the moment comparing to ScaLAPACK.