Please provide the actual

Jim_Z_ · ‎09-16-2015

Hi guys,

I wrote some code to do dgels originally using LAPACKE. I also tried to use MKL to do the dgels. From my understanding, with the exact same code base but just recompile/relink with MKL library would just do the job. Or LD_PRELOAD=libmkl_rt.so can also work upon the test program built with LAPACKE. the programs run OK, but the final results are different.

Any one can tell me why?

Below is the the test program source code as well as the MakeFile lines.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <errno.h>
#include <stdbool.h>
// Use LAPACKE
#include <lapacke/lapacke.h>

#ifndef MAX
#define MAX(a,b) ((a) > (b)? a: b)
#endif
#ifndef MIN
#define MIN(a,b) ((a) < (b)? a: b)
#endif

void dgels_arguments_print(int matrix_order, char trans, int m, int n, int nrhs, int lda, int ldb, FILE *f)
{
fprintf(f, "Dgels arguments\n");
if (matrix_order == 101)
{
fprintf(f, "Matrix order LAPACK_ROW_MAJOR\n");
}
else if(matrix_order == 102)
{
fprintf(f, "Matrix order LAPACK_COL_MAJOR\n");
}
else
{
fprintf(f, "Matrix order Unknown\n");
}
fprintf(f, "Transformation %c\n", trans);
fprintf(f, "Matrix A %d by %d\n", m, n);
fprintf(f, "Matrix B %d by %d\n", m, nrhs);
fprintf(f, "lda %d\n", lda);
fprintf(f, "ldb %d\n", ldb);

}

//
// Row major print
void print_matrix_row(double *array, int row, int column)
{
for (int i=0; i<row; i++)
{
for (int j=0; j<column; j++)
{
printf("%1f\t", array[i*column+j]);
}
printf("\n");
}

}

//
// Column major print
void print_matrix_col(double *array, int row, int column)
{
for (int i=0; i<row; i++)
{
for (int j=0; j<column; j++)
{
printf("%1f\t", array[i+row*j]);
}
printf("\n");
}
}

int main (int argc, char **argv)
{
printf("Test utility to do a simple LAPACK operation. A double-precision dense general\n");
printf("Test case: LAPACKE_DGELS\n");
// do overesitmated linear system solve always, so m>=n
lapack_int m = 10;
lapack_int n = 5;
lapack_int nrhs = 5;
lapack_int lda, ldb, returninfo;
char trans = 'N';//'T'
int matrix_order = LAPACK_COL_MAJOR;
bool matrix_print = false;

printf("%d %d %d\n",m,n,nrhs);
matrix_print = (m*n<100)? true: false;
// row major: lda = n, ldb = hrhs
// column major, lda = max(1,m), ldb = max(1,m,n)
// use column major by default b/c the lapack/magma_lapack use column major to reduce complexity
lda = MAX(1, m);
ldb = MAX(lda, n);

dgels_arguments_print(matrix_order, trans, m, n, nrhs, lda, ldb, stderr);
fprintf(stderr, "Start allocating memory for matrix\n");
// Allocate matrix a and b memory
double *A = NULL;
double *B = NULL;

A = (double *)malloc(lda * n * sizeof(double));
B = (double *)malloc(ldb * nrhs * sizeof(double));
if ((A == NULL) || (B == NULL))
{
fprintf(stderr, "Unable to allocate memory!\n");
free(A);
free(B);
exit(1);
}

// Fill matrix A and B with values
for (int i=0; i< lda*n; i++)
{
// just example data
A = i+0.50;

}
for (int i=0; i< ldb*nrhs; i++)
{
// just example data
B = 2*i+0.10;

}
if (matrix_print)
{
fprintf(stderr, "Original B\n");
print_matrix_col(B, ldb, nrhs);
}

fprintf(stderr, "Start LAPACKE_DGELS operation\n");
returninfo = LAPACKE_dgels(matrix_order, trans, m, n, nrhs, A, lda, B, ldb);
fprintf(stderr, "Return value = %d\n", returninfo);
if (matrix_print)
{
fprintf(stderr, "Output results:\n");
fprintf(stderr, "Overwritten B\n");
print_matrix_col(B, ldb, nrhs);
fprintf(stderr, "\n");
print_matrix_col(B, n, nrhs);
}

free(A); A = NULL;
free(B); B = NULL;

return 0;
}

=========================================================

# Programs/binaries
CC = gcc
RM = rm -f

# Compiler and linker flags
OUTPUTFLAGS = -o $@
WFLAGS = -Wall
OPTFLAGS = -O3
DEBUGFLAGS = -g -DDEBUG $(OPTFLAGS)
RELEASEFLAGS = -DNDEBUG $(OPTFLAGS)

CFLAGS = -std=gnu99 $(WFLAGS)

default: debug

debug: simple_lapacke_netlib.o simple_lapacke_mkl.o
@echo Build cleaning is done.

simple_lapacke_netlib.o: simple_lapacke.c
$(CC) $(CFLAGS) $(DEBUGFLAGS) -o $@ $^ -llapacke -llapack -lm -lrt

simple_lapacke_mkl.o: simple_lapacke.c
$(CC) $(CFLAGS) $(DEBUGFLAGS) -o $@ $^ -L/opt/intel/mkl/lib/intel64 -lmkl_rt -lm -lrt

clean:
$(RM) *.o
@echo Build cleaning is done.

mecej4 · ‎09-16-2015

LapackE is part of MKL; thus, when you pose them as alternatives to choose from, you confuse some of your readers.

I notice -llapacke -llapack in your link line, which suggests that you have a separate pair of libraries (apart from MKL), which came with your Linux distribution or were obtained by you in some other way. It would be useful to know where these originated and whether you built them from sources yourself.

Furthermore, please state which version of MKL you used, which compiler you used, the target architecture and OS, etc., provide the two solutions that you obtained, and point out the one that you think is correct.

Jim_Z_ · ‎09-16-2015

Sorry for the confusion.

What I am doing is comparing between Netlib's LAPACKE and MKL LAPACKE. So I build test programs with both libraries separately but using the same code base.

I am using MKL 11.2 on CentOS.

The results I got form using Netlib LAPACKE is :

1.802063   -0.000000   2.305364   -0.000000   0.008896
0.268674   -0.000000   -1.253072   0.000000   1.538632
0.040839   -0.000000   0.072373   0.000000   -1.178290
-0.005953   0.000000   0.483017   -0.000000   1.795102
-0.105624   0.000000   0.392319   0.000000   -0.164339

from using MKL LAPCKE is:

2.451472   -0.000000   1.619455   -0.000000   -0.105903
-1.040338   -0.000000   -0.113735   -0.000000   2.267483
0.758202   -0.000000   -0.231106   0.000000   -1.329784
-0.111278   0.000000   0.415596   0.000000   0.370730
-0.058058   0.000000   0.309790   0.000000   0.797474

I am not 100% sure which one is correct. But I wrote some GPU code using CUDA library and it indicates the results from Netlib LAPACKE is correct.

Hope these help.

mecej4 wrote:

LapackE is part of MKL; thus, when you pose them as alternatives to choose from, you confuse some of your readers.

I notice -llapacke -llapack in your link line, which suggests that you have a separate pair of libraries (apart from MKL), which came with your Linux distribution or were obtained by you in some other way. It would be useful to know where these originated and whether you built them from sources yourself.

Furthermore, please state which version of MKL you used, which compiler you used, the target architecture and OS, etc., provide the two solutions that you obtained, and point out the one that you think is correct.

mecej4 · ‎09-16-2015

Please provide the actual source code. You have probably listed most of the code above, but we need the complete code. For example, I do not know what is in print_matrix_col().

I think that your made-up matrix A is rank-deficient, so there is no unique "correct" solution.

Jim_Z_ · ‎09-17-2015

Thank you for the input. I put all the source code in the original post and you should be able to see it now.

If you think my makeup matrix is rank deficient, what would be the recommended way/matrix to compare performance? I hate to use random numbers I need to verify the correctness of two libraries.

mecej4 wrote:

Please provide the actual source code. You have probably listed most of the code above, but we need the complete code. For example, I do not know what is in print_matrix_col().

I think that your made-up matrix A is rank-deficient, so there is no unique "correct" solution.

mecej4 · ‎09-17-2015

A large collection of matrices with various properties and various storage formats are available at http://math.nist.gov/MatrixMarket/ . You can "shop" for matrix data there.

different results for LAPACKE_dgels between MKL and LAPACKE