What is happening with mkl_dbsrmm

asd__asdqwe · ‎09-26-2016

Hello,

I'm trying to call mkl_dbsrmm once instead of calling mkl_dbsrmv multiple times, but I'm getting different results, like:

X
1.000000e+00 2.000000e+00
2.000000e+00 3.000000e+00
1.000000e+00 2.000000e+00
2.000000e+00 3.000000e+00
Y = AX (bsrmm)
1.746125e-01 8.390122e-01
1.469818e+00 1.939678e+00
1.974694e-01 2.000000e+00
3.128180e+00 2.000000e+00
Y = AX (bsrmv x2)
1.746125e-01 3.594702e-01
1.469818e+00 2.228417e+00
1.974694e-01 5.919458e-01
3.128180e+00 2.125353e+00

Is there a regression somewhere inside mkl_dbsrmm or am I doing something wrong? I've been dealing with csrmm in the past with no such problem. Thanks in advance.

PS: compile line is, with MKL 2017

icpc bsrmm.cpp -L/opt/intel/mkl/lib/intel64 -lmkl_core -lmkl_rt -lmkl_intel_lp64 -lmkl_intel_thread -liomp5

Zhen_Z_Intel · ‎09-26-2016

Hi qweasd,

I think you misunderstand usage of bsrmm & bsrmv. You set matdescra parameter by zero-base but set indx, pntrb & pntre by one-base. Thus the result of bsrmm & bsrmv are totally different. Another point is that the bsrmm operate multiplication of Matrix & vector, you'd better define input value for bsrmm as X_v[M*BS]={{1,1,2,3},{2,2,3,3}}. And use this input in function as &(X_v[0]). I paste code below:

int ia[M + 1] = { 0, 2, 4 }; //zero-base
    int ja[NNZ]   = {0, 1, 0,1};
    int pointerB ={0,2};
    int pointerE={2,4};
    double a[NNZ * BS * BS] = { 0.505208, -0.015657, -0.015657, 0.530954, -0.310105, 0.0630386, 0.00541174, 0.180264, -0.310105, 0.00541174, 0.0630386, 0.180264 , 0.511346, 0.000116003, 0.000116003, 0.524502};
    double X[M* BS] = { {1, 2}, {1, 2}, {2, 3}, {2, 3} }; //input for bsrmm
    double X_v[M*BS]={{1,1,2,3},{2,2,3,3}}; //input for bsrmv
    double Y[M* BS]  = { {0, 0}, {0, 0}, {0, 0}, {0, 0} };
    double Temp[M * BS] = {{0, 0, 0, 0}, {0, 0, 0, 0} };
    int m = M;
    int n = N;
    int k = K;
    int bs = BS;
    double alpha = 1.0;
    double beta = 0.0;
    char matdescra[6] = { 'T', 'L', 'N', 'C', '0', '0' };
    mkl_dbsrmm("N", &m, &n, &k, &bs, &alpha, matdescra, a, ja, ia, &ia[1], &(X[0][0]), &m, &beta, &(Y[0][0]), &k);
    std::cout << std::scientific;
    std::cout << "X" << std::endl;
    for(int i = 0; i < M * BS; ++i) {
        for(int j = 0; j < N; ++j) {
            std::cout << X[M * BS * j + i] << "\t";
        }
        std::cout << std::endl;
    }
    std::cout << "Y = AX (bsrmm)" << std::endl;
    for(int i = 0; i < M * BS; ++i) {
        for(int j = 0; j < N; ++j) {
            std::cout << Y << "\t";
        }
        std::cout << std::endl;
    }
    std::cout << "Y = AX (bsrmv x2)" << std::endl;
    for(int i = 0; i < N; ++i)
    {
        mkl_dbsrmv("N", &m, &k, &bs, &alpha, matdescra, a, ja, pointerB, pointerE, &(X_v[0]), &beta, &(Temp[0]));
    }
    for(int j = 0; j <  M*BS; j++) {
        for(int i = 0; i < N; i++) {
            std::cout << Temp << "\t";
        }
        std::cout << std::endl;
    }

Best regards,
Fiona

asd__asdqwe · ‎09-27-2016

Thank you for your answer. Sorry, I set matdescra[4] to N instead of F. I have fixed the example now but the problem obviously remains. My BSR matrices use one-based indexing so I don't see how your code can help me. Plus, as stated in the example, I don't have such problem with mkl_dcsrmm which is also used for matrix * matrix products. Could you let me know if I'm calling mkl_dbsrmm right, or if there is a bug with the routine with one-based indexed matrices?

Thanks

asd__asdqwe · ‎09-27-2016

On a side note, if I change in your example { 'T', 'L', 'N', 'C', '0', '0' }; to { 'G', '0', '0', 'C', '0', '0' }; I get different results between, so there is another problem right there.

Zhen_Z_Intel · ‎09-27-2016

Hi qweasd,

Sorry for my written mistake, the input of bsrmv should be

double X_v[M*BS]={{1,1,2,2},{2,2,3,3}};

In your program, there are 3 places lead errors:

1. 'F' could only used for one-based indexing, normally used for Fortran. However in c code, normally used with zero-based indexing. You'd better use G,0,0,C, or change all array's value & start from 1.
2. In bsrmm function, A is 4*4, B is 4*2 (B[0]={1,2}); the result C should be 4*2 as well, the last two value is 2 because you set one-based indexing that first two values of A, B has not been calculated.
3. In bsrmv function, A is 4*4, B is 2*4 (B[0]={1,2,1,2}) . How could you expect the result of this calculation equals to previous one? The result of bsrmv in your program is wrong. The fist column of result is the same that is because the equation is C=alpha*A*B+C. So, in your program, the bsrmv function is action doing work of Y=alpha*A*B+bsrmm_result.
4. seems you didn't use sparse matrix, length of a is 16, all values are non-zero. Thus, the functionality should be same as dgemm. I checked with Dgemm, the return is correct by using following code(fix value setting of X_v).

Please use following code, the result should be as same as Dgemm:

int ia[M + 1] = { 0, 2, 4}; //zero-base for C program
    int ja[NNZ]   = {0,1, 0,1};
    int pointerB ={0,2};
    int pointerE={2,4};
    double a[NNZ * BS * BS] = { 0.505208, -0.015657, -0.015657, 0.530954, -0.310105, 0.0630386, 0.00541174, 0.180264, -0.310105, 0.00541174, 0.0630386, 0.180264 , 0.511346, 0.000116003, 0.000116003, 0.524502};
    double X[M* BS] = { {1, 2}, {1, 2}, {2, 3}, {2, 3} }; //input for srmm
    double X_v[M*BS]={{1,1,2,2},{2,2,3,3}}; //input for srmv
    double Y[M* BS]  = { {0, 0}, {0, 0}, {0, 0}, {0, 0} };
    double Temp[M * BS] = {{0, 0, 0, 0}, {0, 0, 0, 0} };
    int m = M;
    int n = N;
    int k = K;
    int bs = BS;
    double alpha = 1.0;
    double beta = 0.0;
    char matdescra[6] = { 'G', '0', '0', 'C', '0', '0' };
    mkl_dbsrmm("N", &m, &n, &k, &bs, &alpha, matdescra, a, ja, ia, &ia[1], &(X[0][0]), &m, &beta, &(Y[0][0]), &k);
    std::cout << std::scientific;
    std::cout << "X" << std::endl;
    for(int i = 0; i < M * BS; ++i) {
        for(int j = 0; j < N; ++j) {
            std::cout << X << "\t";
        }
        std::cout << std::endl;
    }
    std::cout << "Y = AX (bsrmm)" << std::endl;
    for(int i = 0; i < M * BS; ++i) {
        for(int j = 0; j < N; ++j) {
            std::cout << Y << "\t";
        }
        std::cout << std::endl;
    }
    std::cout << "Y = AX (bsrmv x2)" << std::endl;
    for(int i = 0; i < N; ++i)
    {
        mkl_dbsrmv("N", &m, &k, &bs, &alpha, matdescra, a, ja, pointerB, pointerE, &(X_v[0]), &beta, &(Temp[0]));
    }
    for(int j = 0; j <  M*BS; j++) {
        for(int i = 0; i < N; i++) {
            std::cout << Temp << "\t";
        }
        std::cout << std::endl;
    }
    std::cout << "Results are same!" << std::endl;

Or you could test with Dgemm function, you will find there's no problem with bsrmm.

Best regards,
Fiona

asd__asdqwe · ‎09-28-2016

Wow, that’s kind of funny and sad at the same time how this conversation is going nowhere.

Let me start over. In my application, I have BSR matrix A (of course it has hundred thousands of rows/columns so I cannot use gemm…). I need to compute the product of A and multiple vectors (stored column-wise as a matrix B)

For instance, if m = k = 2, lb = 2, and n = 2, with

B = [1, 2;

3, 4;

5, 6;

7, 8];

stored in memory as B = {1, 3, 5, 7, 2, 4, 6, 8 }; (this is how B must be stored if you need to solve A^-1 B with PARDISO)

I want to compute C = A * B, stored in memory as C = { c_11, c_21, c_31, c_41, c_12, c_22, c_32, c_42 };

I cannot change how B is stored in memory because it is how PARDISO takes it as an argument and I don’t want to change the layout of B everytime I call bsrmm and then PARDISO.

In this thread answered by a colleague of yours https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/365430#comment-1724657, he suggests to use a one-based CSR matrix for the routine csrmm given my layout of B and C, now what I need to do with A to get the desired results as a BSR matrix? I can do a loop over (B + j * lb * m), j = 0..n-1 and call bsrmv n times, but I would rather use bsrmm one time. Are you telling me that the memory layout of the input matrices B are different between bsrmm and csrmm/PARDISO?

I’m attaching another example, the same matrix A is stored as a CSR and a BSR matrix. The calls to csrmm and bsrmv are the same, but not with bsrmm. I cannot change how B and C are stored, otherwise, how can I call PARDISO afterwards? I can change the call to bsrmm however (for example use a zero-based BSR matrix for A).

Thank you in advance for your help.

Zhen_Z_Intel · ‎09-28-2016

Hi qweasd:

I think there are some errors in your code:

1. you print value of y like these: y[j * m * bs + i], is that mean the length of y is 22 ? However, you defined the length of y is 10 as: double y[k*m] (k=5, m=2). Please pay attention to the length of array.

2. You get correct value of csrmv & bsrmv, but it doesn't mean you calculated correctly, the length of y is incorrect. And Number of block rows of the matrix A is also incorrect.

3. By using srmv/crmv/srmm ,the matrix A is normally sparse matrix. Well, if you plan to calculate like a normally matrix in this program, you can. But, you should define the Number of block rows of the matrix A as 1, it means length of block's row not length of A's row. In this example code, the block is 1*1, you have 4 blocks for matrix A that block1 is 10, block2 is 2, block3 is 2, block4 is 10.
If you set Number of block rows of the matrix A, that means each block is a 2*2 matrix.

4. You should set Number of columns of the matrix C as k, not m. The number of columns of C is 5, not 2.

Please read developer guide to help you understand the usage completely. I attached the code below:

    int m_block=1
    int m_x=2;
    mkl_dbsrmm("N", &m_block, &k, &m_block, &bs, &one, matdescra, a, ja, ia, &ia[1], x, &m_x, &two, y, &m_x);
    std::cout << "Y = AX + 2Y (bsrmm F):" << std::endl;
    for(int i = 0; i <m_x; ++i) {
        for(int j = 0; j <k; ++j) {
            std::cout << y[j*m_x+i] << "\t";
        }
        std::cout << std::endl;
    }

regards,
Fiona

asd__asdqwe · ‎09-29-2016

Alright, I now get the correct results. The documentation here https://software.intel.com/en-us/node/520833 is quite misleading:

ldb: Specifies the leading dimension (in blocks) of b as declared in the calling (sub)program.

ldc: Specifies the leading dimension (in blocks) of c as declared in the calling (sub)program.

If I want my first example to return the correct result, I have to set ldb and ldc to M * BS, and not M. I think you should remove "(in blocks)" from the documentation.