I need help. Wrong results when i use dgels, ubuntu 14.04+mkl 11.0

David_E_8 · ‎10-08-2015

Hello

I want to use the MKL library for solving a system of overdetermined linear equations.For this I use the dgels LAPACK function, which is provided by MKL library. In my particular problem (matrix of 4800 rows and 81 columns) the results are incorrect: Both, the QR factorization and the solution are wrong. But seems to work good for toy matrices (6x4 dgels matrix example in mkl documentation).

So I tested with random matrices of different dimensions and comparing the results with MATLAB (QR = qr (A, 0)). I have noticed that, in the QR matrix, errors suddenly become huge when the number of rows exceeds 512 (4KB).

I double check the input parameters to dgels function and look correct:

-QR_MKL(mxn), column major layout
-bHat_MKL(mxr), column major layout
-dgels("N",&m,&n,&r,QR_MKL,&m,bHat_MKL,&m,work,&lwork,&info);

I also checked on another machine with the same general Software Enviroment but the problem persists.
Software enviroment:

Ubuntu 14.04
Intel Composer XE 13.0
Eclipse Kepler SR1, CDT 8.2.1, Eclipse Platform 4.3.1

Build and link lines (Eclipse Console):

icpc -g -I/usr/include/x86_64-linux-gnu/c++/4.8 -fpic -MMD -MP -MF"src/DrawImgTools.d" -MT"src/DrawImgTools.d" -c -o "src/DrawImgTools.o" "../src/DrawImgTools.cpp"
icc -g -mkl=sequential -I/opt/intel/composer_xe_2013.0.079/mkl/include -DMKL_IP64 -fpic -MMD -MP -MF"src/PrintTools.d" -MT"src/PrintTools.d" -c -o "src/PrintTools.o" "../src/PrintTools.c"
icc -g -mkl=sequential -I/opt/intel/composer_xe_2013.0.079/mkl/include -DMKL_IP64 -fpic -MMD -MP -MF"src/TestTools.d" -MT"src/TestTools.d" -c -o "src/TestTools.o" "../src/TestTools.c"
icpc -L/opt/intel/composer_xe_2013.0.079/mkl/lib/intel64 -shared -o "libTools.so" ./src/DrawImgTools.o ./src/PrintTools.o ./src/TestTools.o -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -lX11 -limf

I could not find a any report about a similar bug. I am new to linux and the mkl library, so do not know if I've provided enough information, I hope you can help me

Thanks you for your time

David

mecej4 · ‎10-08-2015

David E. wrote:
I have noticed that, in the QR matrix, errors suddenly become huge when the number of rows exceeds 512 (4KB).

That may have happened because the wrong arguments were passed to DGELS, or there was something peculiar about the specific matrix. Since you used the LP64 library, 512 integers should correspond to 2 KB, not 4KB. Did you use 8-byte integers as arguments to DGELS, by chance?

I have tested a 1033 X 320 matrix, and the results agreed with those from Matlab.

Please provide example code to reproduce the problem, including any necessary data files.

Gennady_F_Intel · ‎10-08-2015

David, it seems you are using ILP64 interface: your command line contains the -DMKL_IP64 but instead of -DMKL_ILP64

You may try to use LP64 API because of your case is not so huge and not needed to use ILP64,

-- Gennady

mecej4 · ‎10-09-2015

Even if -DILP64 was meant and mistyped as -DIP64 (the latter would have no effect if there are no #ifdef IP64...#endif blocks in the code), there is still an inconsistency with the linker line, which contains -lmkl_intel_lp64.

David_E_8 · ‎10-09-2015

Thanks guys, for your interest and your answers.

First, you are right, now I understand that there is an inconsistency in my linked line. In fact I do not intend to use ilp46, my arrays are not so huge. I fix this, but I still have the same problem.
I have prepared codes and examples for you to try to reproduce my bug. I send the workspace, so you can also see my project settings. Also sent Matlab scripts to generate the testcases and a console output example.

With respect to input data to the function dgels, I use double, ie are 64 bits per element, regardless of use LP64 or ILP64. I mentioned this point earlier because this threshold matches with my system page size, which is 4Kb, and I find this interesting. I think maybe the problem is related to the way my system is managing memory, although I could be wrong.

David

mecej4 · ‎10-09-2015

The archive file that you attached contains 1059 files, and I do not intend to try to find out which pieces are essential and which ones pertain to Eclipse, Matlab, etc. Here is something that you can do in a few minutes to verify that GESV is working correctly, and that there is no basis to your inference regarding matrix size and system memory page size. The test matrix is 1033 X 320.

Download the matrix and RHS vector from the Matrix Market: ftp://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/lsq/illc1033.mtx.gz and ftp://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/lsq/illc1033_rhs1.mtx.gz . Decompress the files. Download the m-file for reading Matrix Market format files: http://math.nist.gov/MatrixMarket/mmio/matlab/mmread.m .
Compile and run the program below. It will read the Matrix Market files, call GESV to obtain the solution, and print the first 30 elements of the solution vector.
In Matlab, load the files and obtain the solution using the commands below. Compare the solution to that from Lapack.

The test program source:

      PROGRAM LA_SGELS_EXAMPLE

      USE F95_PRECISION, ONLY: WP => SP
      USE LAPACK95, ONLY: GELS
      IMPLICIT NONE
      INTEGER :: I, J, K, INFO, M, N, NNZ, NRHS
      REAL(WP), ALLOCATABLE :: A(:,:), B(:,:)
      CHARACTER*1 CH
      WRITE (*,*) 'LA_GELS Example Program Results'
      NRHS = 1
      OPEN(UNIT=21,FILE='illc1033.mtx',STATUS='OLD')
      read(21,*)
      read(21,*)m,n,nnz
      WRITE(*,*) m,N, NRHS
      ALLOCATE( A(M,N), B(M,NRHS))
      A=0
      read(21,*)(i,j,A(I,J),k=1,nnz)
      close(21)

      OPEN(UNIT=22,FILE='illc1033_rhs1.mtx',STATUS='OLD')
      read(22,'(A)')(CH,i=1,5)
      read(22,*)B(1:M,1)
      CLOSE(22)

      CALL GELS(  A, B )
      write(*,'(1p,5E16.7)')B(1:30,1)

      END PROGRAM LA_SGELS_EXAMPLE

The Matlab commands:

As=mmread('illc1033.mtx'); b=mmread('illc1033_rhs1.mtx');
A=spconvert(As); x=A\b; x(1:30)