Software Archive
Read-only legacy content
17061 Discussions

Regarding sgemm benchmarks for MIC devices

Christopher_M_5
Beginner
382 Views

Hi Intel forums,

I've had difficulty reproducing the performance reported on the following page:

https://www-ssl.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html

Using the mkl sgemm routine on my 3120 series Xeon Phi, I haven't even approached the 1.7 TFLOP/S level claimed above. The best performance I achieve is ~0.7 TFLOP/S. Presumably, this is because I don't fully understand the threading and vectorization APIs, and I'm not using them optimally. I was wondering if anyone knows where to find the source & environment details used for Intel's official benchmark. Maybe I could compare "correct" usage with my code to better understand the tools.

Thanks,

Chris

0 Kudos
1 Reply
JJK
New Contributor III
382 Views

try playing with the env var KMP_AFFINITY. If I set

export KMP_AFFINITY=balanced

then I achieve 1730 SP GFLOPS/s and 840 DP GFLOPS/s on my 5110P (using sample dgemm.c code from Intel's website).

With any other setting of KMP_AFFINITY performance drops to 360 DP GFLOPS or less.

 

0 Kudos
Reply