NNZ coefficients in dss/paridso LU factor

may_ka · ‎05-26-2018

Hi,

I was wondering whether there is any way to obtain the number of non-zero coefficients of the matrix factor generated by the MKL function mkl_dss_real? I checked the MKL manual but there seems to be no way to get this number.

Background:

I want to multiply a vector v with a matrix K^-1 , b=K^-1v, where K is sparse and K^-1 is not constructable. A way to obtain b=K^-1v is to solve iteratively Kb=v for which I use the mkl_dss solver. In the special setting K is of dimension 2.5Mio x 2.5Mio, is symmetric and positive definite and has 14Mio NNZ coefficients. I understood that the dss_solver uses a LU factorization and subsequently foreward-backward substitution for solving. I also understood that the time complexity of forward/backward substitution is 2o(n²). Given the number of NNZ coefficients in K I could make a rough approximation of the number of floating point operations in routine "mkl_dss_solve" and the associated processing time. However, "mkl_dss_solve" needed much more (~x100) processing time. Currenly the only explanation for this observations is that the nnz coefficients in L/U must be much larger than in K.

Any suggestions are welcomed.

Thanks

mecej4 · ‎05-27-2018

Supposing that the number of non-zero entries in the matrix factors were available, what would you do with that information? As far as I can see, nothing!

I do not understand what you mean by "not constructable" and why iterations are needed. Does K change from iteration to iteration? Why do you use the DSS wrapper instead of the direct Pardiso interface?

If you share details regarding the nature of the matrix, and the sequence of DSS calls that you presently use, we could perhaps suggest ways of improving the computational throughput.

may_ka · ‎05-27-2018

Hi,

Thanks for the response.

I would use the information to understand the processing time! DSS has two passes through a triangular matrix of (unknown) density. An alternative to dss would be a pcg, and from knowing the number of nnz coefficients in the factor I could make a simple calculation how many pcg iterations I could afford until the flops break even.

"Non-constructable" means that it cannot be build because of it's dense nature. If i get the numbers correct 2.5Mio x 2.5Mio will need 50 terabyte if constructed. However, because K is a sort of an autoregressive matrix, K^-1 can be built because it is sparse.

K is constant.

I have not worked with pardiso yet, but understood that the difference between pardiso and dss is merely the interface. Both will eventually use the same routines.

I use the standard sequence of calls: mkl_dss_create, mkl_dss_reorder, mkl_dss_factorize and mkl_dss_solve. I am happy with the processing time of the first three, where the factorization takes about 90 real time seconds.

Cheers

Alexander_K_Intel2 · ‎05-31-2018

may.ka wrote:

I have not worked with pardiso yet, but understood that the difference between pardiso and dss is merely the interface. Both will eventually use the same routines.

That is not fully correct. MKL Pardiso interface allow you to use 2-level factorization, VBSR format, merge forward step and factorization and other tricks that can improve overall performance. Also MKL Pardiso provide number of nonzero elements in LU decomposition. That's why previous comment sounds reasonable - to switch from dss to pardiso.

Thanks,

Alex