How to allocate a matrix which size is on the order of 10^7 x 10^7?

efnacy · ‎02-02-2020

An example of a program involving the use and manipulation of such a huge matrix is GCEED, for instance look at page 3 just before section 2.2 of this conference paper https://journals.jps.jp/doi/pdf/10.7566/JPSCP.5.011010. There it is mentioned that the size of matrices can be up to 10^7 x 10^7. When I tried to invoke a double precision matrix around that size using allocate statement, that part of the code becomes an extremely tight, near closed, bottleneck during execution, even sometimes it prompts the execution to terminate due to insufficient virtual memory. Since my program will involve repeated eigenvalue calculations, any suggestion for how to do it efficiently (using MKL routines or any other method) given I only need around the 20 smallest eigenvalues?

Thomas_F_1 · ‎02-02-2020

A 10^7 x 10^7 double precision matrix would be about 727 TB of data. Current hardware can't support that. In the article, they also state this is a sparse matrix, so likely they are using alternative methods. I recommend contacting the authors.

David_Billinghurst · ‎02-02-2020

You need software specifically for large, sparse eigenvalue calculations. Back in the day I used ARPACK https://www.caam.rice.edu/software/ARPACK/

efnacy · ‎02-02-2020

Do ARPACK routines take input as matrices? Because it seems like I need to find a way around invoking a 10^7 x 10^7 size matrix, this matrix is sparse.

David_Billinghurst · ‎02-02-2020

I forget the details. You will need to RTFM. As best I recall it uses reverse communication, so you may be able to use your existing data structures to calculate a matrix-vector product.

You could also look at the Intel MKL routines https://software.intel.com/en-us/articles/intel-mkl-support-for-largestsmallest-eigenvalue-and-sparse-svd-problem

JohnNichols · ‎02-03-2020

I suggest you are a little outside the computational capacity of the average intel computer -- GCEED uses K Computer - of course there are some fast Intel Supercomputers but I have only seen such things.

They will use a packing scheme, any sparse matrix with less than a 1/3 density needs to be triple - offset i, offset j value. medej4 gave an excellent example in a forum post a few years ago for my program Magni.

Good luck

John

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016)[4][Note 1] – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan.[4][5][6] The K computer was based on a distributed memory architecture with over 80,000 compute nodes.[7] It was used for a variety of applications, including climate research, disaster prevention and medical research.[6] The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.[8]

In June 2011, TOP500 ranked K the world's fastest supercomputer, with a computation speed of over 8 petaflops, and in November 2011, K became the first computer to top 10 petaflops.[9][10] It had originally been slated for completion in June 2012.[10] In June 2012, K was superseded as the world's fastest supercomputer by the American IBM Sequoia.[11]

As of June 2019, K is the world's 20th-fastest computer, with the IBM's Summit & Sierra being the fastest supercomputers.[12][3]

As of November 2018, the K computer holds the third place for the HPCG benchmark. It held the first place until June 2018, when it was superseded by Summit & Sierra.[13][14]

The K supercomputer was decommissioned on the 30th of August 2019.[15][16] In Japan, the K computer will be succeeded by the Fugaku, expected to be 100 times faster, scheduled to be operational in 2021.

efnacy · ‎02-03-2020

Hi @Nichols,

what do you mean by 1/3 density, the number of non-zero matrix elements being less than 1/3 of all elements? And by triple, would triple banded matrix like some finite-difference matrices be an example?

JohnNichols · ‎02-03-2020

A dense matrix is one where all of the elements are non-zero so a 3 by 3 matrix could have the form [[1,2,3], [3,4,5],[5,6,7]], which can then be placed in memory in connected units. here we have 9 reals starting at an offset - all you need is a counter and offset.

In Lisp the concept was developed for the linked list so the result could include a wide variety of arrays for tree and node analysis. As structural analysis programs have grown the number of zeros grows very quickly. So the array above could be listed as 1,1,1.0 :1,2,2.0: 1:3:3.0: 2:1:3.0:, 2:2:4.0: 2,3,5.0:, 3:1;5.0: 3:2:6.0: 3,3:7.0 so we now have 18 integers and 9 reals - not efficient at all

but if we are using a structural matrix the results might be 1,1,1.0 :, 2:2:4.0: : 3,3:7.0, so we have 6 integers and 3 reals. As soon as the real density drops below 0.33 then clearly the second method is better for memory usage. if the integers occupy less space than reals then the density number may change, but most structural matrices are less than 0.33 by a long way. The problem would be that the search algorithm for 1 is going to be different to 2.

Refer to the Pardiso manual.

This is actually quite important as inversion routines to be fast.

gib · ‎02-04-2020

Look at this: https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format)

Numerical methods for big sparse matrices almost without exception use the CSR or CSC method (or something similar) to store the matrix.

JohnNichols · ‎02-04-2020

Pardiso uses CSR I think

David_Billinghurst · ‎02-04-2020

gib wrote:
Numerical methods for big sparse matrices almost without exception use the CSR or CSC method ...

A significant minority use a reverse communication interface. For example https://software.intel.com/en-us/mkl-developer-reference-c-iterative-sparse-solvers-based-on-reverse-communication-interface-rci-iss and http://www.netlib.org/lapack/lawnspdf/lawn99.pdf

This can be particularly effective for a problem posed on a mesh,where the non-zero matrix coefficients relate to the interaction with neighbouring nodes and may be calculated and stored in advance.