How to use the function ‘cblas_dgemm_compute’ of mkl?

lianchen · ‎09-26-2022

Are there any examples showing how to use those functions: cblas_dgemm_pack_get_size(), cblas_dgemm_pack(), cblas_dgemm_compute() ? I would like to realize a specialized GEMM with a packed matrix B. Thanks.

This is my code. Do I use them correctly？

int main(int argc, const char* argv[])

{

// matrix parameters

int M, N, K;

int LDA, LDB, LDC;

printf("[INPUT] input M N K\n");

if(scanf("%d %d %d", &M, &N, &K) == 3){

printf("[TRUE] true parameters for scanf\n");

}

else{

printf("[FALSE] false parameters for scanf\n");

exit(EXIT_FAILURE);

}

// matrix buffer, column major

LDA = M, LDB = K, LDC = M;

double *A = NULL,

*B = NULL, *B_PACK = NULL,

*C1 = NULL, *C2 = NULL;

double alpha = 0.000001, beta = 0.000001;

A = (double *) malloc (sizeof(double) * M * K);

B = (double *) malloc (sizeof(double) * K * N);

C1 = (double *) malloc (sizeof(double) * M * N);

C2 = (double *) malloc (sizeof(double) * M * N);

gen_matrix(A, M, K), gen_matrix(B, K, N), gen_matrix(C1, M, N); // initialize matrix A、B、C1

matrix_copy(C1, M, N, C2); // copy the value from C1 to C2

B_PACK = (double *) malloc (cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K));

cblas_dgemm_pack(CblasColMajor, CblasBMatrix, CblasNoTrans, M, N, K, alpha, B, LDB, B_PACK);

cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, LDA, B, LDB, beta, C1, LDC);

cblas_dgemm_compute(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, A, LDA, B_PACK , LDB, beta, C2, LDC);

double diff = max_abs_diff(M, N, C1, LDC, C2, LDC);

printf("diff = %lf\n", diff); // returns the maximum absolute difference over

// corresponding elements of matrices A and B.

return 0;

}

VidyalathaB_Intel · ‎09-27-2022

Hi,

Thanks for reaching out to us.

>>Are there any examples showing how to use those functions:

Yes, we do have examples that show the usage of the functions which you have mentioned and you can also refer to the MKL manual which shows the location of the examples https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/cblas-gemm-pack-get-size-cblas-gemm-pack-get-size.html.

You can find the examples under the installed directory of MKL

In Windows, you can find it under this location

>> C:\Program Files (x86)\Intel\oneAPI\mkl\latest\examples\examples_core_c\c\blas\source\cblas_dgemm_computex.c

In Linux & MacOS, you can find it under this location

>> /opt/intel/oneapi/mkl/2022.1.0/examples/c/blas/source/cblas_dgemm_computex.c

Please refer to the examples and get back to us know if you have any issues.

Regards,

Vidya.

VidyalathaB_Intel · ‎10-03-2022

Hi @lianchen ,

As we haven't heard back from you, could you please provide us with an update regarding the issue?

Regards,

Vidya.

lianchen · ‎10-09-2022

Thanks.I have learned from the examples to use those functions correctly. But I have an another question: the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.

VidyalathaB_Intel · ‎10-10-2022

Hi,

>>I have learned from the examples to use those functions correctly

Thanks for getting back to us and glad to know that it helped.

>>the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.

Could you please let us know which interface you have used LP64/ILP64 during compilation?

Please let us know if you have encountered any issues/errors while working with the cblas_dgemm_pack_get_size() routine by providing us with the sample reproducer code and command you have used to compile it so that we can test the same from our end.

Regards,

Vidya.

lianchen · ‎10-20-2022

@VidyalathaB_Intel wrote:

Hi,

>>I have learned from the examples to use those functions correctly

Thanks for getting back to us and glad to know that it helped.

>>the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.

Could you please let us know which interface you have used LP64/ILP64 during compilation?

Please let us know if you have encountered any issues/errors while working with the cblas_dgemm_pack_get_size() routine by providing us with the sample reproducer code and command you have used to compile it so that we can test the same from our end.

Regards,

Vidya.

My source code.

int main(int argc, const char* argv[])
{
    // matrix parameters
    int M, N, K;
    int LDA, LDB, LDC;
    printf("[INPUT] input M N K\n");
    if(scanf("%d %d %d", &M, &N, &K) == 3){
        printf("[TRUE] true parameters for scanf\n");
    }
    else{
        printf("[FALSE] false parameters for scanf\n");
        exit(EXIT_FAILURE);
    }
    
    // matrix buffer, column major
    LDA = M, LDB = K, LDC = M;
    double *A = NULL,
        *B = NULL, *B_PACK = NULL,
        *C = NULL, *C1 = NULL;
    double alpha = 0.111111, beta = 0.111111;

    A = (double *) malloc (sizeof(double) * M * K);
    B = (double *) malloc (sizeof(double) * K * N);
    C = (double *) malloc (sizeof(double) * M * N);
    C1 = (double *) malloc (sizeof(double) * M * N);
    
    // randomized matrix elements
    int seed[] = {0, 0, 0, 1};
    LAPACKE_dlarnv(1, seed, M * K, A);
    LAPACKE_dlarnv(1, seed, K * N, B);
    LAPACKE_dlarnv(1, seed, M * N, C);

    memcpy(C1, C, sizeof(double) * M * N);
    cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, LDA, B, LDB, beta, C, LDC);           // cblas_dgemm

    B_PACK = mkl_malloc(cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K), 64);
    cblas_dgemm_pack(CblasColMajor, CblasBMatrix, CblasNoTrans, M, N, K, alpha, B, LDB, B_PACK);
    cblas_dgemm_compute(CblasColMajor, CblasNoTrans, CblasPacked, M, N, K, A, LDA, B_PACK, LDB, beta, C1, LDC);     // cblas_dgemm_compute
    
    double diff = max_abs_diff(M, N, C, LDC, C1, LDC);  // returns the maximum absolute difference over corresponding elements of matrices C and C1.
    printf("diff = %.9lf\n", diff);

    printf("size of B_PACK = %d\n", cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K));

    return 0;
}

My command.

[xx@cn0 gemm]$ make test_gemm_mkl.x
gcc -O2 -fopenmp -fPIC -o test_gemm_blas.o -c test_gemm_blas.c -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc -O2 -fopenmp -fPIC -o utils.o          -c ../utils/utils.c -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc test_gemm_blas.o utils.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -ldl  -o test_gemm_mkl.x -lm -fopenmp -fPIC
[xx@cn0 gemm]$ ./test_gemm_mkl.x
[INPUT] input M N K
256 256 256
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7766912

The function cblas_dgemm_pack_get_size() returns a so big number(7 766 912 in byte) that I have to allocate a really big buffer to store the packed matrix B when M/N/K equals to 256/256/256. Yes, I think the return value will definitely be more than 524 288(256 * 256 * 8), but not too much(like 7 766 912). Thanks for your generous help to me.

VidyalathaB_Intel · ‎10-16-2022

Hi,

As we haven't heard back from you, could you please provide us with an update regarding the issue?

Regards,

Vidya.

VidyalathaB_Intel · ‎10-21-2022

Hi,

Could you please try changing mkl_intel_lp64 to mkl_intel_ilp64 and check the results? Also could you please check with smaller matrix size and see if it still allocates a big buffer to store the matrix?

Please check and get back to us so that we can proceed further.

Regards,

Vidya.

lianchen · ‎10-22-2022

@VidyalathaB_Intel wrote:

Hi,

Could you please try changing mkl_intel_lp64 to mkl_intel_ilp64 and check the results? Also could you please check with smaller matrix size and see if it still allocates a big buffer to store the matrix?

Please check and get back to us so that we can proceed further.

Regards,

Vidya.

[xx@cn0 gemm]$ make test_gemm_mkl.x
gcc -O2 -fopenmp -fPIC -o test_gemm_blas.o -c test_gemm_blas.c -I/home/xx/lib/plasma/include -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include  
gcc -O2 -fopenmp -fPIC -o utils.o -c ../utils/utils.c -I/home/xx/lib/plasma/include -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include 
gcc test_gemm_blas.o  utils.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm -ldl -o test_gemm_mkl.x -lm -fopenmp -fPIC 
[xx@cn0 gemm]$ ./test_gemm_mkl.x 
[INPUT] input M N K
256 256 256
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7766912
[xx@cn0 gemm]$ ./test_gemm_mkl.x 
[INPUT] input M N K
32 32 32
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7111552

Yes, I have changed mkl_intel_lp64 to mkl intel_ilp64 and gotten the same results. When M/N/K equals to 32/32/32, the function cblas_dgemm_pack_get_size() still returns a big number 7 111 552(in byte).

lianchen · ‎10-24-2022

Hi,

I have made some changes and gotten some similar results. I'm looking forward to your suggestion. Could you please provide me with an update regarding the issue.

Regards,

Lianchen

VidyalathaB_Intel · ‎10-24-2022

Hi @lianchen ,

I apologize for the delay.

>>I have to allocate a really big buffer...

To obtain the best performance, the buffer needs to be aligned for 4MB and API requests large enough size. So this isn't the issue with the cblas_dgemm_pack_get_size() function.

Please do let us know if you have any other issues.

EDIT: Large size is due to large page alignment for better performance and expected.

Regards,

Vidya.

lianchen · ‎10-26-2022

I really appreciate your generous help and patient advice.

Regards,

lianchen.

VidyalathaB_Intel · ‎10-26-2022

Hi @lianchen ,

Glad to know that it helps.

Could you please confirm if we can close this thread from our end since the issue is resolved?

Regards,

Vidya.

lianchen · ‎11-02-2022

Sure. Thank you very much for your helpful advice.

VidyalathaB_Intel · ‎11-02-2022

Hi lianchen,

Thanks for the confirmation.

Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.

Have a Great Day!

Regards,

Vidya.