Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6977 Discussions

MKL cblas_dgemm huge performance gap on intelOneAPI and Parallel Studio Studio

Rizwan1
Beginner
2,747 Views

Hi,

 

Previously I had tested mkl cblas_dgemm for m=n=k=10000 and found a performance of approximately 2 TFlop but now the same give me a performance of around 0.8 TFlops.

 

Case 1:

CentOS 7.2.* and Intel Parallel Studio XE

MKL cblas_dgemm

m=n=k=10000

Performance: 1900+ GFlops

 

Case 2:

CentOS 8.5.* and Intel OneAPI

MKL cblas_dgemm

m=n=k=1000

Performance: 750+ GFlops

 

Why this is the huge gap. something is wrong

What is the reason?

Please assist and guide 

 

0 Kudos
33 Replies
Gennady_F_Intel
Moderator
795 Views

Following with this log, you once again linked against MKL 2022.0 -- see line #3 of the log file, But we asked you to link against MKL 2017 and show a similar log.

0 Kudos
Rizwan1
Beginner
790 Views

Dear,

 

I don't have that previous version of MKL now.

0 Kudos
Rizwan1
Beginner
788 Views

Could you please share your test machine environment variables?

 

 

0 Kudos
Rizwan1
Beginner
778 Views

Please let me know how can I download previous version of Intel OneAPI Base Kit and HPC Kit

0 Kudos
Gennady_F_Intel
Moderator
777 Views

there is no specific environment setting at this machine. The attached _2017u2.zip contains the statically linking version of MKL 2017 u2. the password is the same as with the previous zip. 

run this code on your systems: MKL_VERBOSE=1 ./a.out 10000

0 Kudos
Rizwan1
Beginner
765 Views

To run this every time I got message Permission Denied

0 Kudos
Gennady_F_Intel
Moderator
760 Views

Ok, I found out the Xeon Phi ( KNL) machine and run the dgemm code on this system with verbose mode enabled:

MKL 2017 u2:

MKL_VERBOSE Intel(R) MKL 2017.0 Update 2 Product build 20170126 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) for Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) enabled processors, Lnx 1.30GHz lp64 intel_thread NMICDev:0

 

MKL 2020.0 

MKL_VERBOSE oneMKL 2022.0 Product build 20211112 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 1.30GHz lp64 intel_thread

 

You may see that running the dgemm linked against the latest MKL 2020, the AVX2 code path is used. That is exactly what was announced at the MKL 2020 Release Note - see Deprecation:

 

Running the MKL 2017 code, we could see that the AVX-512 code path has been used.

That's the reason of the performance differences you have reported.

-Gennady

0 Kudos
Rizwan1
Beginner
754 Views

Could you please share the link because I did not see any such kind of note in 2020 Release notes

0 Kudos
Rizwan1
Beginner
734 Views

Thanks for sharing

 

One thing I can not understand is if Intel could not support AVX-512 in this version then how do they manage MKL performance. Didn't provide any alternative?

 

Please also confirm the following questions

 

What is the best performance of the intel mp Linpack benchmark on Intel Xeon Phi 7250?

0 Kudos
Rizwan1
Beginner
673 Views
Dear
 
Could you please share the environment setting using environment variable KMP_SETTINGS=true
So that I can know what is missing at my end.
Please share this will be a great help in this regard
 
 
Thanks
0 Kudos
Gennady_F_Intel
Moderator
727 Views

Intel continues supporting AVX-512 ISA through all current types of CPU.

About Linpack’s performance benchmarks – these results were published some time ago on the MKL page when the Xeon Phi was no deprecated. You may try to make the sears some similar discussions through this forum.

 

The thread is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

 

0 Kudos
Rizwan1
Beginner
673 Views
Dear
 
Could you please share the environment setting using environment variable KMP_SETTINGS=true
So that I can know what is missing at my end.
Please share this will be a great help in this regard
 
 
Thanks
0 Kudos
Reply