Hi
When evaluating the performance of Intel oneMKL on the Intel Xeon Phi KNL Processor, I observed distinct results for matrix sizes where m=n=k. Specifically, for matrix sizes of 8000 and 32000, the achieved GFlops were approximately 2086.05 and 1169.80, respectively. Notably, a substantial discrepancy in performance between these two matrix sizes was observed. WHY?
链接已复制
Hi,
Thanks for posting in Intel Communities.
Could you please let us know the OS details, hardware details, and Intel MKL version you are using?
Also, could you please provide us with the matrix input, and sample reproducer code along with the steps you had followed to reproduce the issue at our end?
And also, could you please let us know the difference in performance between the expected output and the output you are getting to investigate more?
Thanks & Regards,
Varsha
Hi
Machine: Intel Xeon Phi CPU 7250
OS: CentOS Linux release 8.5.2111 (kernel 4.18.0-348.7.1.el8_5.x86_64)
Intel OneAPI: mkl version 2021.3.0
Sample code is attached
Set size=8000 or 32000
icc mkl_dgemm.c -O3 -mkl=parallel -qopenmp -lmemkind -xMIC-AVX512 -o mkl_dgemm.out -DMSIZE=$size -DNSIZE=$size -DKSIZE=$size
./mkl_dgemm.out
Thanks
Hi,
Thanks for providing the details.
When we tried the code provided by you, we are getting errors while compiling the code as we don't have "hbwmalloc.h" header and its functions, could you please provide us with the complete files and steps you are following to reproduce the issue at our end?
Also, we recommend you use the latest version of Intel MKL 2023.2 for better performance. And also, Intel Classic Compilers are going to deprecate in the upcoming releases so we suggest you move to the Intel LLVM compilers(icpx compiler) to compile your code.
You can use Intel Link Line Advisor for proper compiling and linking with the Intel ICPX compiler.
Thanks & Regards,
Varsha
Dear,
Then please comment this line
//#include<hbwmalloc.h>
Replace MCDRAM version code
=====================================================
/*
A = (double *) hbw_malloc(sizeof(double)*m*k);
hbw_posix_memalign((void *) A, 64, sizeof(double)*m*k);
B = (double *) hbw_malloc(sizeof(double)*k*n);
hbw_posix_memalign((void *) B, 64, sizeof(double)*k*n);
C = (double *) hbw_malloc(sizeof(double)*m*n);
hbw_posix_memalign((void *) C, 64, sizeof(double)*m*n);
*/
=====================================================
With simple malloc
A = (double *) malloc(sizeof(double)*m*k);
B = (double *) malloc(sizeof(double)*k*n);
C = (double *) malloc(sizeof(double)*m*n);
And
=====================================================
/* MCDRAM version
hbw_free(A);
hbw_free(B);
hbw_free(C);
*/
=====================================================
With
free(A);
free(B);
free(C);
Thanks
Hi,
Thanks for your reply.
Sorry for the inconvenience caused to you. Support for Intel® Xeon Phi™ Processor x200 “Knights Landing (KNL)” and Intel® Xeon Phi™ Processors “Knights Mill (KNM)” is deprecated. For more details, please refer to the below link:
It is always recommended to use the latest oneAPI Products with supported system requirements for optimal performance.
Could you please try on the supported system with the latest Intel oneMKL(2023.2) and let us know if you have any issues?
Thanks & Regards,
Varsha