Problem applying dsyevr to a large matrix

mspecian · ‎07-13-2010

I am attempting to diagonalize a 66113 X 66113 real symmetric matrix using LAPACK's dsyevr routine. Each element in the matrix is an 8-byte float. I am running MKL version 10.2 update 4 on a machine with16 Intel Xeon CPUs each running at 2.93GHz with 4 CPU cores. The machine has about 128GB of memory.

While I am able to successfully diagonalize small matrices (the largest I've test is a 10 by 10), diagonalizing my large matrix has proven challenging. My code compiles and even seems to run fine, but after dsyevr has finished executing the parameter "info" is greater than 1, indicating that there was some kind of "internal error".

Does anyone have any idea what might cause this internal error? There is no explicit segmentation fault. The code fully completes except the arrays with eigenvalues and eigenvectors come back empty.

In case somebody thinks this may be a code related issue, I've included it below. The first is the C code that allocates memory, reads in files, etc. The next is the FORTRAN subroutine called by this C code. Finally, I include the linker that puts it all together.

EigenFind.c

////////////////////////////////////////////////////////////////////////////////////////////

// This routine calculates the eigenvalues and eigenvectors of the symmetric matrix //

// of doubles fwritten to the file "fnameANM" using the dsyevr LAPACK procedure. Since //

// memory constraints restrict the creation of a huge, complete eigenvector file, only //

// the il-th through iu-th eigenvalues and eigenvectors are returned where 1<=il<= //

// iu<=nrows. The eigenvalues are stored from most positive to most negative. This //

// routine is somewhat faster than the packed version. //

////////////////////////////////////////////////////////////////////////////////////////////

void GetEigenvectors(char *fnameANM, long nrows, long il, long iu) {

double *w, *z, *work, rnrows, working, *EArray;

size_t size, ii, ee;

long info, lz, lzp, lisuppz, lwork, m, *iwork, liwork, iil, iiu, cc, rr;

info = NULL;

printf("\\nThis routine returns eigenmode #%d (larger eigenvalue) through %d (smaller eigenvalue).\\n\\n",il,iu);

//dsyevr puts the largest Evl's Evc last, so this fixes that

long swap = iu;

iu = nrows - il + 1;

il = nrows - swap + 1;

lz = iu-il+1; //Number of eigenvectors to be returned.

lzp = lz; //Same variable, but this doesn't get overwritten.

lisuppz = 4*lz; //Size of the iwork array

lwork = 50*nrows; //Size of work array

liwork= 40*nrows; //Size of lwork array

//////////////////// In case of seg fault, try upping the size of these arrays.

rnrows = nrows;

working = rnrows*rnrows;

size = floor(working+0.0001);

double *a = freadDMatrix(fnameANM,size);

printf("The final element of the average noise matrix is %f.\\n",a[size-1]);

w = (double *)malloc(nrows*sizeof(double));

printf("Regarding memory allocation for w: "); ErrnoDescription(errno);

z = (double *)malloc(nrows*lz*sizeof(double));

printf("Regarding memory allocation for z: "); ErrnoDescription(errno);

work = (double *)malloc(lwork*sizeof(double));

printf("Regarding memory allocation for work: "); ErrnoDescription(errno);

iwork = (long *)malloc(liwork*sizeof(long));

printf("Regarding memory allocation for iwork: "); ErrnoDescription(errno);

lwork = -1; liwork = -1; //Start with -1 to find optimal values.

printf("\\n\\nUsing dsyevr to determine optimal sizes of lwork and iwork.\\n");

dsyevrf_(&nrows, a, &il, &iu, &m, w, z, &lz, &lisuppz, work, &lwork, iwork, &liwork, &info);

if(info==0) {

printf("Correct values of iwork and liwork found successfully after %d seconds.\\n\\n", tEnd-tBegin);

lwork = floor(work[0]+0.000001); printf("minimum size of lwork = %d\\n",lwork);

liwork = iwork[0]; printf("minimum size of liwork = %d\\n",liwork);

free(work); free(iwork);

work = (double *)malloc(lwork*sizeof(double));

printf("Regarding memory allocation for work: "); ErrnoDescription(errno);

iwork = (long *)malloc(liwork*sizeof(long));

printf("Regarding memory allocation for iwork: "); ErrnoDescription(errno);

}

if(info<0) printf("Parameter %d had an illegal value.\\n",(-1*info));

////////////// THIS IS THE REAL EXECUTION OF THE EIGENMODE FINDER //////////////

printf("\\nExecuting dsyevrf.\\n");

dsyevrf_(&nrows, a, &il, &iu, &m, w, z, &lz, &lisuppz, work, &lwork, iwork, &liwork, &info);

tEnd = time(tloc);

printf("Exited dsyevrf.\\n\\n");

if(info<0) printf("Parameter %d had an illegal value.\\n",(-1*info));

if(info>1) printf("AN INTERNAL ERROR HAS OCCURRED\\nAN INTERNAL ERROR HAS OCCURRED\\n\\n");

}

void dspevxf_(long long *nrows, double *ap, long long *trisize, long long *il, long long *iu, long long *m, double *w, double *z, long *lz, double *work, long long *lwork, long long *iwork, long long *liwork, long

int main ()

{

long long ncells6 = 66113;

///////////////////////////////////////////////////////////////////////////////////////

///////////////////////////////// GetEigenvectors /////////////////////////////////////

///////////////////////////////////////////////////////////////////////////////////////

char *fnameGCorr_sy = "/home/mj/PSnoise/noiseMat_Vec/GCorMat_R6sy_fwrite";

long il_ge = 1;

long iu_ge = 10000;

/*GetEigenvectors[1. Name of symmetric, fwritten ncells by ncells covariance function

to be diagonalized

2. Dimension of above matrix, also # of cells

3. Value >= 1 equaling # of first eigenvector returned (1 is largest)

4. Value >= il and <= ncells equaling # of final eigenvector returned] */

GetEigenvectors(fnameGCorr_sy, ncells6, il_ge, iu_ge);

LAPACK_dspevx_f.f

c********************************************************************

subroutine dspevxf(n,ap,lap,il,iu,m,w,z,lz,work,lwork,iwork,

+liwork,ifail,info)

c--------------------------------------------------------------------

c Calculates selected eigenvalues and eigenvectors of a

c symmetric matrix T triangularly packed into ap.

c

c Based upon the dspevx routine.

c

c This routine is called from the C function GetEigenvectorsPacked

c

c 'N' should be used if only eigenvalues are desired.

c 'V' should be used if both eigenvalues and eigenvalues are

c desired.

c For the second parameter, 'A' means all eigenvalues will be

c found. 'V' means all eigenvalues in the half-open interval

c (vl,vu] will be found. 'I' mean the eigenvalues with indices

c il through iu will be found.

c For the third parameter, 'U' means the symmetric matrix is

c stored in the upper triangle of a and 'L' means its stored in

c the lower triangle.

c "n" is the number of rows (columns) in the symmetric matrix

c "ap" is the nXn array packed into a size "lap" = n(n+1)/2 housing

c the symmetric matrix in its upper or lower triangle.

c "vl" and "vu" are the lower and upper bounds respectively of the

c interval to be searched for eigenvalues. "vl" <= "vu". These

c values are referenced only if the second parameter = 'V'

c "il" and "iu" are the indices of the smallest and largest eigenvalues

c to be returned such that 1<=il<=iu<=in. These values are referenced

c only if the second parameter = 'I'

c "abstol" is the absolute error tolerance for the eigenvalues.

c "z" is an output parameter. If 'N', this is not referenced. If

c 'V', then this will contain the "lz" eigenvectors specified by

c the range "il" through "iu".

c "m" is the total number of eigenvalues found.

c "w" is the array to which the first m eigenvalues are stored.

c "isuppz" contains, upon output, the indices indicating the non-zero

c elements in "z". It has length "lisuppz".

c "work" is a workspace array of length "lwork" = 8n

c "iwork" is a workspace array of length "liwork" = 5n

c "ifail" is an integer array of length "n" containing the indices of

c the eigenvectors that failed to converge or, if "info"=0, the first

c m elements of ifal are zero.

c "info" is an integer. If all is ok, a 0 is output. If i, the

c algorithm did not converge and i indicates the number of

c elements of an intermediate tridiagonal form which did not

c converge to zero. If -i, the ith parameter had an illegal

c value.

c

c Using LAPACK.

c

implicit none

integer*8 n,lap,il,iu,m,lz,lwork,liwork,iwork(liwork),ifail(n),

+info

real*8 ap(lap),vl,vu,abstol,dlamch,w(n),z(n,lz),work(lwork)

external dlamch

c

abstol = 2*dlamch('S')

print *, 'ap(final) = ', ap(lap)

print *, 'About to compute eigenvectors',il,'through',iu,'.'

print *, 'n = ',n

call dspevx('V','I','L',n,ap,vl,vu,il,iu,abstol,m,w,z,n,

+work,iwork,ifail,info)

c

return

end

c

Makefile

.f.o:

$(F77) -c $<

.c.o:

$(CC) -c $<

default: all

CC = gcc

F77 = ifort

CL = ifort -nofor_main

CL8= ifort -nofor_main -i8

INCF = -I/usr/site/intel_fortran/include #FORTRAN INCLUDE FILES

IMKLF = -I/usr/site/intel_fortran/mkl/include #FORTRAN MKL FILES

INCC = -I/usr/site/intel_cpp/include #C++ INCLUDE FILES

IMKLC = -I/usr/site/intel_cpp/mkl/include #C++ MKL FILES

LMKL = -Vaxlib -L/usr/site/intel_fortran/mkl/lib/em64t -lfftw3 -lmkl_core -lmkl_sequential -lmkl_intel_ilp64

eigen: LAPACK_dspevx_f.oEigenFind.o

$(CL8) $^ $(LMKL) -o $@.o

Artem_V_Intel · ‎07-13-2010

Hello,

Could you please submit your test case as an attachment to your post? Looks like code your posted has been broken by the text editor.

Could you also specify what "info" value did you obtain on exit of dsyevr() function?

Best regards,
Artem

mspecian · ‎07-14-2010

I tried two test cases. The first was the 3X3 matrix {2, -1, 0; -1, 2, -1; 0, -1, 2}. The other was a 10X10 matrix where the upper (or lower) triangle was the sequence of integers from 0 to 54. In other words the first row, first column was 0. The first row, second column (and second row, first column) was 1, and so on. Both cases returned the correct eigenvalues and eigenvectors.

I do not have the info parameter readily available. Until this error occurred I had not written a line to return it explicitly. I have another version of the diagonalization code running for packed storage with dspevx. If this also fails, I will have the value of info returned. However, these routines take a while to run...over two days in fact, so it's not a quick matter to simply recover the parameter.

I'm not aware of whether it is possible to parallelize this routine to run over multiple processors. Would you happen to know? Thank you very much for your generous attention.

Gennady_F_Intel · ‎07-14-2010

1)How we can investiagte the original problem (withdsyevr) you reported?

>>> for packed storage with dspevx...

2)Do you meandspevx executed > then 2 days!. What was the order of the input matrixes?

3)The list of threaded functions, you can find into User's Guide, chapter 6,Using the Intel MKL Parallelism.

--Gennady

mspecian · ‎07-14-2010

1) Exactly. I'm running code that will test the same procedure with dspevx right now. If it finishes before there's resolution on this thread, I will post the results.

2) dsyevr took about 2 days 6 hours to execute on my machine. The matrix is 66113 squared.

3) Thanks for the reference. I had seen this before but it's good to get outside verification.

Michael_C_Intel4 · ‎07-15-2010

Hi,

dsyevr calls dsytrd, which is threaded.

Did you try other eigensolvers such as dsyev, dsyevd, dsyevx?

What is the typical execution time for less size problem, say 10000, 20000, 40000 ?

Michael.

Michael_C_Intel4 · ‎07-15-2010

One more question, why do you link against mkl_sequential? You will definitely run in a single thread then. You probably need to link against threaded binary mkl_intel_thread instead.

Michael.

yuriisig · ‎07-15-2010

I do not recommend using dsyevr: unreliable, RRR algorithm is not parallelized and eigenvectors are of low accuracy. I hold the diagonalization, which allows memory 128GB diagonalizirovt matrix up to 175000 * 175000 and find just the right eigenvectors and eigenvalues.

mspecian · ‎07-15-2010

Hi Michael,

Thanks for replying. No, I have not tried the other eigensolvers yet. As I wrote above, I'm running through a sequential operation of dspevx right now, but I will certainly give those other routines a look. I have yet to do an execution time test for matrices of the size you mentioned. Once dsyevr failed, I've prioritized trying the other algorithms.

To your second point about sequential versus threading, I wasn't acutely aware of the dsyevr - dsytrd connection. I can try the threaded version as well to see if it affects execution time.

I was wondering if you had a response to the comment below about the dsyevr algorithm being unreliable in general.

mspecian · ‎07-15-2010

You write that dsyevr is not parallelized. Do you have a response to the above comment, i.e. dsyevr calls dsytrd (which is parallelized) to reduce the matrix to tridiagonal form? I'm not sure what percentage of the execution time is spend on the reduction to tridiagonal form, but if it is significant, this might prove to be a useful speed-up.

I should also mention that I am not concerned with all the eigenvectors. I'm using a version of Principal Component Analysis and need only the first chunk of eigenvectors (those associated with the largest positive eigenvalues) to be accurate. Can you comment on whether dsyevr can handle this task?

You mention that you don't recommend dsyevr. What do you recommend instead? Is your lack of recommendation due to accuracy errors, or unreliability? I am still trying to identify why an internal error in this routine is occurring.

yuriisig · ‎07-15-2010

Quoting mspecian

You write that dsyevr is not parallelized...

I wrote that is not paralleled RRR algorithm, which is included in dsyevr: http://software.intel.com/en-us/forums/showthread.php?t=73653(Unfortunately, now my page is not available). Intel MKL - very good package, but for your problem, he is not well suited. Most likely an internal error occurs because the RRR algorithm. Most time is spent on bringing to tridiagonal form for BLAS level 2. The rest you can quickly calculate, given that you do not need all the eigenvectors. Once again I say that dsyevr because RRR algorithm inaccurate and unreliable.

mspecian · ‎07-16-2010

Quick update: dspevx executed successfully (i.e. info = 0) but the eigenvalues returned were all zeros. All eigenvectors were zero vectors as well.

yuriisig · ‎07-16-2010

The algorithms work with packed matrices in the Intel MKL is much slower than with square matrices.

yuriisig · ‎07-17-2010

Quoting mspecian

Quick update: dspevx executed successfully (i.e. info = 0) but the eigenvalues returned were all zeros. All eigenvectors were zero vectors as well.

Propose to act as follows:
1. dsytrd
2. dstedc (To program this feature, it took 10 years.)This algorithm requires a lot of RAM, but fast and reliable.
3. dormtr (Apply this feature only to those eigenvectors tridiagonal matrix that you want).