- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Intel forums,
I've had difficulty reproducing the performance reported on the following page:
https://www-ssl.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html
Using the mkl sgemm routine on my 3120 series Xeon Phi, I haven't even approached the 1.7 TFLOP/S level claimed above. The best performance I achieve is ~0.7 TFLOP/S. Presumably, this is because I don't fully understand the threading and vectorization APIs, and I'm not using them optimally. I was wondering if anyone knows where to find the source & environment details used for Intel's official benchmark. Maybe I could compare "correct" usage with my code to better understand the tools.
Thanks,
Chris
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
try playing with the env var KMP_AFFINITY. If I set
export KMP_AFFINITY=balanced
then I achieve 1730 SP GFLOPS/s and 840 DP GFLOPS/s on my 5110P (using sample dgemm.c code from Intel's website).
With any other setting of KMP_AFFINITY performance drops to 360 DP GFLOPS or less.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page