There is nothing in the code

Radu_M_ · ‎06-19-2014

Hello,

While testing out MKL Lapack's dtpqrt and dtpmqrt routines, I've stumbled across a weird segfault. I replicated the error in this example (I should mention that I use Eigen just to make my life easier and that populateEigenMat is just creating a random matrix).

The problem is that for different value of the parameter m (the number of rows of the matrix B in Lapack's reference for dtpqrt and dtmqrt), the code either works (small values of m, and the result is correct) or it creates a segfault (larger values of m).

int main()
{

  int m =150;
  int n = 5;
  int nb = 1;
  int l = 0;
  int info;

  MatrixXd a(n,n), b(m,n), t(nb, n);
  populateEigenMat(a), populateEigenMat(b);
  a = a.eval().triangularView<Eigen::Upper>();
  
  info = LAPACKE_dtpqrt(LAPACK_COL_MAJOR, m, n, l, nb, a.data(), a.rows(), 
  		       b.data(), b.rows(), t.data(), t.rows());

  int k = n;
  
  MatrixXd cA(k,n), cB(m,n), c(k+m,n);
  populateEigenMat(cA), populateEigenMat(cB);
  c<<cA,cB;

  cout<<"Still ok!\n";
  // cout<<endl<<cB<<endl<<endl<<c.block(k,0,m,n)<<endl<<endl;
  info = LAPACKE_dtpmqrt(LAPACK_COL_MAJOR, 'L', 'N', m, n, k, l, nb, b.data(), b.rows(), t.data(), t.rows(),
  			 c.data(), c.rows(), c.data()+k, c.rows());

}

I'm using version of Intel Composer: composer_xe_2013_sp1.2.144 and the following links -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -std=c++11 -xAVX -DMKL_ILP64.

Could some please tell me where I've made a mistake?

Kind regards

mecej4 · ‎06-19-2014

There is nothing in the code shown above that makes the integer arguments in the call to dtpqrt() to be 8-byte integers, as required if you use the ILP64 libraries. If you do not include any of the MKL header files, the -DMKL_ILP64 compiler option has no effect.

Radu_M_ · ‎06-19-2014

I included the the MKL header. But I should mention that I get a segfault using the parameters shown in the code above. If i put m lower than 122 and I keep the same value for n then the code works.

mecej4 wrote:

There is nothing in the code shown above that makes the integer arguments in the call to dtpqrt() to be 8-byte integers, as required if you use the ILP64 libraries. If you do not include any of the MKL header files, the -DMKL_ILP64 compiler option has not effect.

mecej4 · ‎06-19-2014

I included the the MKL header.

There are nearly forty header files in the MKL directory. Which one(s) did you include?

Without a reproducer (complete code with instructions to build the example, and input data, if needed) it is unlikely that your problem can be diagnosed and solved.

Secondly, you are using a third party template package ("Eigen"?), so information on the version of that package is needed.

Radu_M_ · ‎06-19-2014

Here's the complete code (Ive modified it so that you won't have to use Eigen -i.e. it is self contained - and it still produces the same segfault when m is large (in this particular case m >= 125, for m it works okay):

#include <random>
#include <iostream>

#include "mkl.h"
using namespace std;


mt19937 rng(0);
normal_distribution<double> dist(0., 1.);
void populateMat(double* mat, int rows, int cols)
{
  for (auto i = 0; i < rows*cols; ++i){
    mat = dist(rng);
  }
}
int main()
{

  int m =125;
  int n = 5;
  int nb = 1;
  int l = 0;
  int info;

  double* a = new double[n*n];
  double* b = new double[m*n];
  double* t = new double[nb*n];
  populateMat(a, n, n), populateMat(b, m, n);
  
  info = LAPACKE_dtpqrt(LAPACK_COL_MAJOR, m, n, l, nb, a, n, b, m, t, nb);

  int k = n;

  double* c = new double[(k+m)*n]; 

  populateMat(c, k+m, n);

  cout<<"Still ok!\n";

  info = LAPACKE_dtpmqrt(LAPACK_COL_MAJOR, 'L', 'N', m, n, k, l, nb, b, m, t, nb, c, k+m, c+k, k+m);
  
  delete[] a;
  delete[] b;
  delete[] c;
  delete[] t;
  
}

To compile it I've used the following libraries/compiler flags/includes: -L $MKLROOT/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -std=c++11 -DMKL_ILP64 -I $MKLROOT/include.

Since I'm new to MKL, I've used MKL Library Link Advisor.

Thank you very much for your patience.

mecej4 · ‎06-19-2014

I have no experience with the Lapack routines dtpqrt and dtpmqrt, but I have one observation based on the documentation: matrix A is supposed to be upper triangular and matrix B is supposed to be pentagonal (see https://software.intel.com/en-us/node/469004). Since you fill the entire matrices A and B, you are assuming that the Lapack routines will ignore the non-zero values that you placed into those places where the routines expect zero. I would not make such assumptions without the presence of statements in the documentation that those values will not be accessed or will be overwritten by zeros before the decomposition algorithm is started.

Nevertheless, you have provided a short reproducer and so the Intel personnel will have something to work with when they turn to this problem report.

Radu_M_ · ‎06-19-2014

They perform the QR factorisation (and multiply Q) of matrices that have a very specific structure. These matrices are the building blocks of the QR factorisation of tall and skinny matrices and more generally in communication-avoiding QR decompositions. These routines are fairly recent.

I filled up the matrices so I can simplify this short example but I don't think that has an influence on this bug. (in fact when I found this error I had zeros in the correct places).

I don't think I've made any obvious mistake in that code, so I'll chalk it up to an MKL bug.

Thank you very much for the time you've spent on my problem.

Segfault in the dtpmqrt routine