Modified Cholesky factorisation

Bartłomiej_K_ · ‎02-26-2015

Hello,

I'm using MKL to calculate Cholesky factorisation of a covariance matrix. MKL (?POTRF function) is of course much faster than my own naiive implementation (input: 6500x6500 matrix), however there is a problem. Our client requires a *modified* version of the algorithm (below - custom minimum conditions) and therefore MKL gives different results. After I remove the custom minimum conditions (<0.001) from my implementation, both algorithms give *perfectly* equal results.

Is it possible to force MKL to respect these custom conditions somehow? Thanks for any help.

mecej4 · ‎02-26-2015

These "custom conditions" are arbitrary and capricious, and you can satisfy them by simply rescaling the matrix with a constant multiplier if the matrix is positive definite. To inflict an arbitrary change such as this upon the unsuspecting users of a widely used library such as MKL would not be acceptable.

Before forcing MKL to "respect" these conditions, one has to establish that the conditions deserve respect. Please give a logical justification for the criterion, if possible, along with references.

Cholesky factorization applies only to positive definite matrices, whereas covariance matrices $\Sigma \,$ are only positive semi-definite. If you can justify using $\Sigma \,$ + λ I in your application, simply add λ to the diagonal elements of $\Sigma \,$ before calling ?potrf. Such a correction is, in fact, used with justification in optimization algorithms.

Bartłomiej_K_ · ‎02-27-2015

Thank you for responding.

These conditions are required by the Client, to "assure the numerical stability" of the results. We cannot change them. Perhaps it will become more clear if I present the entire problem.

There are 2 input matrices 80rows x 90columns. First one, "R", represents the strength of rain in each cells, with values <0;50>. The second one, "QI", the quality of measurement from range <0;1>.

The first step of the algorithm calculates the errors of the rain field D, using the equation: D[i,j] = R[i,j] * (1-Q[i,j])

Second step: we calculate matrix "A" (80*90 x 80*90), representing the covariance of the rain field errors. The equations are:

A[i,j] = var(di) when i = j, cov(di,dj) when i <> j
di is the cell of matrix D, however calculated as row-wise index from the beginning (like in memory index).

This matrix is the input to the Cholesky part. As far as I understand it, it is always positive definite.

mecej4 · ‎02-27-2015

This quote is from http://en.wikipedia.org/wiki/Positive-definite_matrix under "Connections".

In statistics, the covariance matrix of a multivariate probability distribution is always positive semi-definite; and it is positive definite unless one variable is an exact linear combination of the others. Conversely, every positive semi-definite matrix is the covariance matrix of some multivariate distribution.

You can easily check whether your specific matrix is positive definite or not. Hand it to the Cholesky factorization routine. If the routine fails, the matrix is either semi-definite or indefinite. In the former case, adding λI with different trial values of the multiplier λ can be tried, or your client could investigate whether the model variables were properly selected. The latter case (indefinite matrix) should not occur if the covariance matrix was properly calculated (years ago, some versions of Excel sometimes yielded negative values for a computed variance).