- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are there any examples showing how to use those functions: cblas_dgemm_pack_get_size(), cblas_dgemm_pack(), cblas_dgemm_compute() ? I would like to realize a specialized GEMM with a packed matrix B. Thanks.
This is my code. Do I use them correctly?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
>>Are there any examples showing how to use those functions:
Yes, we do have examples that show the usage of the functions which you have mentioned and you can also refer to the MKL manual which shows the location of the examples https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/blas-and-sparse-blas-routines/blas-like-extensions/cblas-gemm-pack-get-size-cblas-gemm-pack-get-size.html.
You can find the examples under the installed directory of MKL
In Windows, you can find it under this location
>> C:\Program Files (x86)\Intel\oneAPI\mkl\latest\examples\examples_core_c\c\blas\source\cblas_dgemm_computex.c
In Linux & MacOS, you can find it under this location
>> /opt/intel/oneapi/mkl/2022.1.0/examples/c/blas/source/cblas_dgemm_computex.c
Please refer to the examples and get back to us know if you have any issues.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @lianchen ,
As we haven't heard back from you, could you please provide us with an update regarding the issue?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.I have learned from the examples to use those functions correctly. But I have an another question: the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>I have learned from the examples to use those functions correctly
Thanks for getting back to us and glad to know that it helped.
>>the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.
Could you please let us know which interface you have used LP64/ILP64 during compilation?
Please let us know if you have encountered any issues/errors while working with the cblas_dgemm_pack_get_size() routine by providing us with the sample reproducer code and command you have used to compile it so that we can test the same from our end.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@VidyalathaB_Intel wrote:
Hi,
>>I have learned from the examples to use those functions correctly
Thanks for getting back to us and glad to know that it helped.
>>the function cblas_dgemm_pack_get_size() returns a really big number(in byte) when m\n\k equals to 256\256\256, which means it will need a big buffer.
Could you please let us know which interface you have used LP64/ILP64 during compilation?
Please let us know if you have encountered any issues/errors while working with the cblas_dgemm_pack_get_size() routine by providing us with the sample reproducer code and command you have used to compile it so that we can test the same from our end.
Regards,
Vidya.
My source code.
int main(int argc, const char* argv[])
{
// matrix parameters
int M, N, K;
int LDA, LDB, LDC;
printf("[INPUT] input M N K\n");
if(scanf("%d %d %d", &M, &N, &K) == 3){
printf("[TRUE] true parameters for scanf\n");
}
else{
printf("[FALSE] false parameters for scanf\n");
exit(EXIT_FAILURE);
}
// matrix buffer, column major
LDA = M, LDB = K, LDC = M;
double *A = NULL,
*B = NULL, *B_PACK = NULL,
*C = NULL, *C1 = NULL;
double alpha = 0.111111, beta = 0.111111;
A = (double *) malloc (sizeof(double) * M * K);
B = (double *) malloc (sizeof(double) * K * N);
C = (double *) malloc (sizeof(double) * M * N);
C1 = (double *) malloc (sizeof(double) * M * N);
// randomized matrix elements
int seed[] = {0, 0, 0, 1};
LAPACKE_dlarnv(1, seed, M * K, A);
LAPACKE_dlarnv(1, seed, K * N, B);
LAPACKE_dlarnv(1, seed, M * N, C);
memcpy(C1, C, sizeof(double) * M * N);
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, M, N, K, alpha, A, LDA, B, LDB, beta, C, LDC); // cblas_dgemm
B_PACK = mkl_malloc(cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K), 64);
cblas_dgemm_pack(CblasColMajor, CblasBMatrix, CblasNoTrans, M, N, K, alpha, B, LDB, B_PACK);
cblas_dgemm_compute(CblasColMajor, CblasNoTrans, CblasPacked, M, N, K, A, LDA, B_PACK, LDB, beta, C1, LDC); // cblas_dgemm_compute
double diff = max_abs_diff(M, N, C, LDC, C1, LDC); // returns the maximum absolute difference over corresponding elements of matrices C and C1.
printf("diff = %.9lf\n", diff);
printf("size of B_PACK = %d\n", cblas_dgemm_pack_get_size(CblasBMatrix, M, N, K));
return 0;
}
My command.
[xx@cn0 gemm]$ make test_gemm_mkl.x
gcc -O2 -fopenmp -fPIC -o test_gemm_blas.o -c test_gemm_blas.c -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc -O2 -fopenmp -fPIC -o utils.o -c ../utils/utils.c -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc test_gemm_blas.o utils.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm -ldl -o test_gemm_mkl.x -lm -fopenmp -fPIC
[xx@cn0 gemm]$ ./test_gemm_mkl.x
[INPUT] input M N K
256 256 256
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7766912
The function cblas_dgemm_pack_get_size() returns a so big number(7 766 912 in byte) that I have to allocate a really big buffer to store the packed matrix B when M/N/K equals to 256/256/256. Yes, I think the return value will definitely be more than 524 288(256 * 256 * 8), but not too much(like 7 766 912). Thanks for your generous help to me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we haven't heard back from you, could you please provide us with an update regarding the issue?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please try changing mkl_intel_lp64 to mkl_intel_ilp64 and check the results? Also could you please check with smaller matrix size and see if it still allocates a big buffer to store the matrix?
Please check and get back to us so that we can proceed further.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@VidyalathaB_Intel wrote:
Hi,
Could you please try changing mkl_intel_lp64 to mkl_intel_ilp64 and check the results? Also could you please check with smaller matrix size and see if it still allocates a big buffer to store the matrix?
Please check and get back to us so that we can proceed further.
Regards,
Vidya.
[xx@cn0 gemm]$ make test_gemm_mkl.x
gcc -O2 -fopenmp -fPIC -o test_gemm_blas.o -c test_gemm_blas.c -I/home/xx/lib/plasma/include -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc -O2 -fopenmp -fPIC -o utils.o -c ../utils/utils.c -I/home/xx/lib/plasma/include -I/home/xx/lib/intel/oneapi/mkl/2022.1.0/include -I../../include
gcc test_gemm_blas.o utils.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm -ldl -o test_gemm_mkl.x -lm -fopenmp -fPIC
[xx@cn0 gemm]$ ./test_gemm_mkl.x
[INPUT] input M N K
256 256 256
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7766912
[xx@cn0 gemm]$ ./test_gemm_mkl.x
[INPUT] input M N K
32 32 32
[TRUE] true parameters for scanf
diff = 0.000000000
size of B_PACK = 7111552
Yes, I have changed mkl_intel_lp64 to mkl intel_ilp64 and gotten the same results. When M/N/K equals to 32/32/32, the function cblas_dgemm_pack_get_size() still returns a big number 7 111 552(in byte).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have made some changes and gotten some similar results. I'm looking forward to your suggestion. Could you please provide me with an update regarding the issue.
Regards,
Lianchen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @lianchen ,
I apologize for the delay.
>>I have to allocate a really big buffer...
To obtain the best performance, the buffer needs to be aligned for 4MB and API requests large enough size. So this isn't the issue with the cblas_dgemm_pack_get_size() function.
Please do let us know if you have any other issues.
EDIT: Large size is due to large page alignment for better performance and expected.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I really appreciate your generous help and patient advice.
Regards,
lianchen.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @lianchen ,
Glad to know that it helps.
Could you please confirm if we can close this thread from our end since the issue is resolved?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure. Thank you very much for your helpful advice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi lianchen,
Thanks for the confirmation.
Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.
Have a Great Day!
Regards,
Vidya.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page