Eigen+MKL vs. MKL-Only

Anonymous · ‎07-22-2016

Currently, I have a code that uses Eigen (a C++ template library for linear algebra) to save a square general dense matrix in the following way

ZMatrix = new Eigen::MatrixXcd;
(*ZMatrix).resize(N_line,N_line);

And subsequently solve the resulting linear system of equations using Eigen as follow:

Eigen::PartialPivLU<Eigen::MatrixXcd> ZMatrixLU(*ZMatrix);
(*Solution) = ZMatrixLU.solve(*RHS);

I recently compiled the Eigen library based on MKL by defining EIGEN_USE_MKL_ALL. This works fine but I am wondering what would happen if I completely get rid of Eigen and do the following instead

MKL_INT m=N_line;
MKL_Complex16 *ZMatrix;
const int allignment = 64;
ZMatrix = (MKL_Complex16 *)mkl_malloc( m*m*sizeof( MKL_Complex16 ), allignment);

And this time use MKL functions to solve the resulting linear system of equations. Now I have the following questions:

By saving the dense matrix ZMatrix, in MKL's way (the second technique), is it possible to save memory?
By using MKL's functions instead of Eigen, is it possible to make the operations faster?
If answer to any or both of the above questions are yes, what is the best way of defining a dense matrix and solving the resulting system using MKL? Is there a tutorial code that I can follow to make sure that I am doing everything in its best possible way in terms of both memory and speed.

Please note that the size of the dense matrix can range from (1000x1000) to (20,000x20,000).

Ying_H_Intel · ‎07-24-2016

Hi

Nice to know you make the eigen and MKl work together. You question actually depends on compiled the Eigen library based on MKL by defining EIGEN_USE_MKL_ALL.

a) If the EIGEN_USE_MKL_ALL ensure Eigen::PartialPivLU = MKL zgetrf () : Computes the LU factorization of a general m-by-n matrix.

As i understand, they are same, then you don't have to replace one with another same one.

b) If the EIGEN_USE_MKL_ALL can't ensure EigenlLU = MKL.LU , then you may try MKL functions because good performance

From the article, http://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html, it seems the case a). anyway, you can try it out

Regarding your questions.

1) In MKL ways, it doesn't save memory, but the memory alignment like MKL_Complex16 *)mkl_malloc( m*m*sizeof( MKL_Complex16 ), allignment); may bring better performance. You can check MKL user guide to see the details.

2) performance may be same (in case a, maybe a little change about interface transform) , performance may be better in case b)

3) If you install MKL, then there is MKL example under MKL install directory, which provide almost most of MKL functions's demo code. Include C code and fortran code. for example dgesv.c . You may refer to them.

Best Regards,

Ying

  ========================
 
 DGESV Example Program Results

 Solution
  -0.80  -0.39   0.96
  -0.70  -0.55   0.22
   0.59   0.84   1.90
   1.32  -0.10   5.36
   0.57   0.11   4.04

 Details of LU factorization
   8.23   1.08   9.04   2.14  -6.87
   0.83  -6.94  -7.92   6.55  -3.99
   0.69  -0.67 -14.18   7.24  -5.19
   0.73   0.75   0.02 -13.82  14.19
  -0.26   0.44  -0.59  -0.34  -3.43

 Pivot indices
      5      5      3      4      5
*/
#include <stdlib.h>
#include <stdio.h>
#include <mkl.h>

/* Auxiliary routines prototypes */
extern void print_matrix( char* desc, int m, int n, double* a, int lda );
extern void print_int_vector( char* desc, int n, int* a );

/* Parameters */
#define N 5
#define NRHS 3
#define LDA N
#define LDB N

/* Main program */
int main() {
	/* Locals */
	MKL_INT n = N, nrhs = NRHS, lda = LDA, ldb = LDB, info;
	/* Local arrays */
	MKL_INT ipiv;
	double a[LDA*N] = {
	    6.80, -2.11,  5.66,  5.97,  8.23,
	   -6.05, -3.30,  5.36, -4.44,  1.08,
	   -0.45,  2.58, -2.70,  0.27,  9.04,
	    8.32,  2.71,  4.35, -7.17,  2.14,
	   -9.67, -5.14, -7.26,  6.08, -6.87
	};
	double b[LDB*NRHS] = {
	    4.02,  6.19, -8.22, -7.57, -3.03,
	   -1.56,  4.00, -8.67,  1.75,  2.86,
	    9.81, -4.09, -4.57, -8.61,  8.99
	};
	/* Executable statements */
	printf( " DGESV Example Program Results\n" );
	/* Solve the equations A*X = B */
	dgesv( &n, &nrhs, a, &lda, ipiv, b, &ldb, &info );
	/* Check for the exact singularity */
	if( info > 0 ) {
		printf( "The diagonal element of the triangular factor of A,\n" );
		printf( "U(%i,%i) is zero, so that A is singular;\n", info, info );
		printf( "the solution could not be computed.\n" );
		exit( 1 );
	}
	/* Print solution */
	print_matrix( "Solution", n, nrhs, b, ldb );
	/* Print details of LU factorization */
	print_matrix( "Details of LU factorization", n, n, a, lda );
	/* Print pivot indices */
	print_int_vector( "Pivot indices", n, ipiv );
	exit( 0 );
} /* End of DGESV Example */

/* Auxiliary routine: printing a matrix */
void print_matrix( char* desc, int m, int n, double* a, int lda ) {
	int i, j;
	printf( "\n %s\n", desc );
	for( i = 0; i < m; i++ ) {
		for( j = 0; j < n; j++ ) printf( " %6.2f", a[i+j*lda] );
		printf( "\n" );
	}
}

Anonymous · ‎07-26-2016

Thank you Ying. These are my findings after actually implementing it using MKL-Only. Maybe others can benefit from this discussion as well:

Saving the dense matrix in MKL way and Eigen way is the same in terms of memory use. For example, the largest matrix that I can fill into my memory is roughly (12000x12000) matrix both using Eigen::MatrixXcd and MKL_Complex16 *ZMatrix.
By using zgesv() of MKL instead of ZMatrixLU() of Eigen, there is no gain in speed, but the memory requirement for zgesv() is less. For example, when I solve a system with 5796 unknowns using ZMatrixLU() or zgesv(), 23 seconds is required (both over 4 cores using OpenMP). However, for larger systems ZMatrixLU() runs into memory issues while zgesv() can handle systems up to 8100 unknowns.
I used the example in mkl/examples/exacmples_core_c.zip/lapackc/source/zgesv.c. However, instead of _dcomplex, I had to use MKL_Complex16. I think this example should be updated because I could not compiled it as is. Only when I introduced MKL_Complex16 it could be built. ( I used Intel C++ compiler rather than Microsoft's Visual C++ compiler as I usually get executable files which are twice as fast using Intel C++ compiler).

Hope this helps.

Mohammad

Ying_H_Intel · ‎07-26-2016

Hi Mohammad,

Thanks a lot for sharing your findings.

Regarding 3. yes, it seems better to use MKL_Complex16. some compiler may report warning about the data types.

the dcomplex was defined in the zgesv.c

struct _dcomplex { double re, im; };
typedef struct _dcomplex dcomplex;

zgesv.c(112) : warning C4133: 'function' : incompatible types - from 'dcomplex [16]' to 'MKL_Complex16 *'

zgesv.c(112) : warning C4133: 'function' : incompatible types - from 'dcomplex [8]' to 'MKL_Complex16 *'

Best Regards,

Ying