Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

How to use the function ‘cblas_dgemm_compute’ of mkl?

lianchen
Beginner
1,359 Views

Are there any examples showing how to use those functions: cblas_dgemm_pack_get_size(), cblas_dgemm_pack(), cblas_dgemm_compute() ? I would like to realize a specialized GEMM with a packed matrix B. Thanks.

 

This is my code. Do I use them correctly?

int main(int argcconst charargv[])
{
    // matrix parameters
    int M, N, K;
    int LDA, LDB, LDC;
    printf("[INPUT] input M N K\n");
    if(scanf("%d %d %d", &M, &N, &K) == 3){
        printf("[TRUE] true parameters for scanf\n");
    }
    else{
        printf("[FALSE] false parameters for scanf\n");
        exit(EXIT_FAILURE);
    }
   
    // matrix buffer, column major
    LDA = M, LDB = K, LDC = M;
    double *A = NULL,
                  *B = NULL, *B_PACK = NULL,
                  *C1 = NULL, *C2 = NULL;
    double alpha = 0.000001, beta = 0.000001;

 

    A = (double *) malloc (sizeof(double) * M * K);
    B = (double *) malloc (sizeof(double) * K * N);
    C1 = (double *) malloc (sizeof(double) * M * N);
    C2 = (double *) malloc (sizeof(double) * M * N);
    gen_matrix(A, M, K), gen_matrix(B, K, N), gen_matrix(C1, M, N);    // initialize matrix A、B、C1
    matrix_copy(C1, M, N, C2);  // copy the value from C1 to C2
 
    B_PACK = (double *) malloc (cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K));
    cblas_dgemm_pack(CblasColMajor, CblasBMatrix, CblasNoTrans, M, N, K, alpha,   B, LDB, B_PACK);

 

    cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, LDA, B, LDB, beta, C1, LDC);
    cblas_dgemm_compute(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, A, LDA, B_PACK , LDB, beta, C2, LDC);

 

    double diff = max_abs_diff(M, N, C1, LDC, C2, LDC);
    printf("diff = %lf\n", diff);  // returns the maximum absolute difference over
                                                    // corresponding elements of matrices A and B.

 

    return 0;
}
0 Kudos
14 Replies
VidyalathaB_Intel
Moderator
1,338 Views

Hi,

 

Thanks for reaching out to us.

>>Are there any examples showing how to use those functions:

Yes, we do have examples that show the usage of the functions which you have mentioned and you can also refer to the MKL manual which shows the location of the examples https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/cblas-gemm-pack-get-size-cblas-gemm-pack-get-size.html.

You can find the examples under the installed directory of MKL

 

In Windows, you can find it under this location

>> C:\Program Files (x86)\Intel\oneAPI\mkl\latest\examples\examples_core_c\c\blas\source\cblas_dgemm_computex.c

In Linux & MacOS, you can find it under this location

>> /opt/intel/oneapi/mkl/2022.1.0/examples/c/blas/source/cblas_dgemm_computex.c

 

Please refer to the examples and get back to us know if you have any issues.

 

Regards,

Vidya.

 

0 Kudos
VidyalathaB_Intel
Moderator
1,261 Views

Hi @lianchen ,

 

As we haven't heard back from you, could you please provide us with an update regarding the issue?

 

Regards,

Vidya.

 

0 Kudos
lianchen
Beginner
1,222 Views

Thanks.I have learned from the examples to use those functions correctly. But I have an another question: the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.

0 Kudos
VidyalathaB_Intel
Moderator
1,200 Views

Hi,


>>I have learned from the examples to use those functions correctly

Thanks for getting back to us and glad to know that it helped.


>>the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.

Could you please let us know which interface you have used LP64/ILP64 during compilation?


Please let us know if you have encountered any issues/errors while working with the cblas_dgemm_pack_get_size() routine by providing us with the sample reproducer code and command you have used to compile it so that we can test the same from our end.


Regards,

Vidya.


0 Kudos
lianchen
Beginner
1,066 Views

@VidyalathaB_Intel wrote:

Hi,

 

>>I have learned from the examples to use those functions correctly

Thanks for getting back to us and glad to know that it helped.

 

>>the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.

Could you please let us know which interface you have used LP64/ILP64 during compilation?

 

Please let us know if you have encountered any issues/errors while working with the cblas_dgemm_pack_get_size() routine by providing us with the sample reproducer code and command you have used to compile it so that we can test the same from our end.

 

Regards,

Vidya.



My source code.

 

int main(int argc, const char* argv[])
{
    // matrix parameters
    int M, N, K;
    int LDA, LDB, LDC;
    printf("[INPUT] input M N K\n");
    if(scanf("%d %d %d", &M, &N, &K) == 3){
        printf("[TRUE] true parameters for scanf\n");
    }
    else{
        printf("[FALSE] false parameters for scanf\n");
        exit(EXIT_FAILURE);
    }
    
    // matrix buffer, column major
    LDA = M, LDB = K, LDC = M;
    double *A = NULL,
        *B = NULL, *B_PACK = NULL,
        *C = NULL, *C1 = NULL;
    double alpha = 0.111111, beta = 0.111111;

    A = (double *) malloc (sizeof(double) * M * K);
    B = (double *) malloc (sizeof(double) * K * N);
    C = (double *) malloc (sizeof(double) * M * N);
    C1 = (double *) malloc (sizeof(double) * M * N);
    
    // randomized matrix elements
    int seed[] = {0, 0, 0, 1};
    LAPACKE_dlarnv(1, seed, M * K, A);
    LAPACKE_dlarnv(1, seed, K * N, B);
    LAPACKE_dlarnv(1, seed, M * N, C);

    memcpy(C1, C, sizeof(double) * M * N);
    cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, LDA, B, LDB, beta, C, LDC);           // cblas_dgemm

    B_PACK = mkl_malloc(cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K), 64);
    cblas_dgemm_pack(CblasColMajor, CblasBMatrix, CblasNoTrans, M, N, K, alpha, B, LDB, B_PACK);
    cblas_dgemm_compute(CblasColMajor, CblasNoTrans, CblasPacked, M, N, K, A, LDA, B_PACK, LDB, beta, C1, LDC);     // cblas_dgemm_compute
    
    double diff = max_abs_diff(M, N, C, LDC, C1, LDC);  // returns the maximum absolute difference over corresponding elements of matrices C and C1.
    printf("diff = %.9lf\n", diff);

    printf("size of B_PACK = %d\n", cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K));

    return 0;
}

 

 

My command.

 

[xx@cn0 gemm]$ make test_gemm_mkl.x
gcc -O2 -fopenmp -fPIC -o test_gemm_blas.o -c test_gemm_blas.c -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc -O2 -fopenmp -fPIC -o utils.o          -c ../utils/utils.c -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc test_gemm_blas.o utils.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -ldl  -o test_gemm_mkl.x -lm -fopenmp -fPIC
[xx@cn0 gemm]$ ./test_gemm_mkl.x
[INPUT] input M N K
256 256 256
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7766912

 

 

The function cblas_dgemm_pack_get_size() returns a so big number(7 766 912 in byte) that I have to allocate a really big buffer to store the packed matrix B when M/N/K equals to 256/256/256. Yes, I think the return value will definitely be more than 524 288(256 * 256 * 8), but not too much(like 7 766 912). Thanks for your generous help to me.

0 Kudos
VidyalathaB_Intel
Moderator
1,122 Views

Hi,


As we haven't heard back from you, could you please provide us with an update regarding the issue?


Regards,

Vidya.


0 Kudos
VidyalathaB_Intel
Moderator
1,040 Views

Hi,


Could you please try changing mkl_intel_lp64 to mkl_intel_ilp64 and check the results? Also could you please check with smaller matrix size and see if it still allocates a big buffer to store the matrix?

Please check and get back to us so that we can proceed further.


Regards,

Vidya.


0 Kudos
lianchen
Beginner
1,019 Views

@VidyalathaB_Intel wrote:

Hi,

 

Could you please try changing mkl_intel_lp64 to mkl_intel_ilp64 and check the results? Also could you please check with smaller matrix size and see if it still allocates a big buffer to store the matrix?

Please check and get back to us so that we can proceed further.

 

Regards,

Vidya.



 

[xx@cn0 gemm]$ make test_gemm_mkl.x
gcc -O2 -fopenmp -fPIC -o test_gemm_blas.o -c test_gemm_blas.c -I/home/xx/lib/plasma/include -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include  
gcc -O2 -fopenmp -fPIC -o utils.o -c ../utils/utils.c -I/home/xx/lib/plasma/include -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include 
gcc test_gemm_blas.o  utils.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm -ldl -o test_gemm_mkl.x -lm -fopenmp -fPIC 
[xx@cn0 gemm]$ ./test_gemm_mkl.x 
[INPUT] input M N K
256 256 256
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7766912
[xx@cn0 gemm]$ ./test_gemm_mkl.x 
[INPUT] input M N K
32 32 32
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7111552

 

Yes, I have changed mkl_intel_lp64 to mkl intel_ilp64 and gotten the same results. When M/N/K equals to 32/32/32, the function cblas_dgemm_pack_get_size() still returns a big number 7 111 552(in byte).

 

0 Kudos
lianchen
Beginner
973 Views

Hi,

 

I have made some changes and gotten some similar results. I'm looking forward to your suggestion. Could you please provide me with an update regarding the issue.

 

Regards,

Lianchen

0 Kudos
VidyalathaB_Intel
Moderator
968 Views

Hi @lianchen ,

 

I apologize for the delay.

 

>>I have to allocate a really big buffer... 

To obtain the best performance, the buffer needs to be aligned for 4MB and API requests large enough size. So this isn't the issue with the cblas_dgemm_pack_get_size() function.

Please do let us know if you have any other issues.

 

EDIT: Large size is due to large page alignment for better performance and expected.

 

Regards,

Vidya.

 

0 Kudos
lianchen
Beginner
936 Views

I really appreciate your generous help and patient advice.

 

Regards,

lianchen.

0 Kudos
VidyalathaB_Intel
Moderator
933 Views

Hi @lianchen ,

 

Glad to know that it helps.

Could you please confirm if we can close this thread from our end since the issue is resolved?

 

Regards,

Vidya.

 

0 Kudos
lianchen
Beginner
880 Views

Sure. Thank you very much for your helpful advice.

0 Kudos
VidyalathaB_Intel
Moderator
879 Views

Hi lianchen,


Thanks for the confirmation.

Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.


Have a Great Day!


Regards,

Vidya.


0 Kudos
Reply