Regarding sgemm benchmarks for MIC devices

Christopher_M_5 — Fri, 21 Aug 2015 20:04:36 GMT

Hi Intel forums,

I've had difficulty reproducing the performance reported on the following page:

https://www-ssl.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html

Using the mkl sgemm routine on my 3120 series Xeon Phi, I haven't even approached the 1.7 TFLOP/S level claimed above. The best performance I achieve is ~0.7 TFLOP/S. Presumably, this is because I don't fully understand the threading and vectorization APIs, and I'm not using them optimally. I was wondering if anyone knows where to find the source & environment details used for Intel's official benchmark. Maybe I could compare "correct" usage with my code to better understand the tools.

Thanks,

Chris

try playing with the env var

JJK — Mon, 24 Aug 2015 22:17:00 GMT

try playing with the env var KMP_AFFINITY. If I set

export KMP_AFFINITY=balanced

then I achieve 1730 SP GFLOPS/s and 840 DP GFLOPS/s on my 5110P (using sample dgemm.c code from Intel's website).

With any other setting of KMP_AFFINITY performance drops to 360 DP GFLOPS or less.

topic try playing with the env var in Software Archive

Regarding sgemm benchmarks for MIC devices

try playing with the env var