Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
7234 Discussions

Performance of offloaded MKL FFTs on the MIC, anyone?

Fernando_G_2
Beginner
590 Views

My initial experiments offloading MKL FFTs into the MIC (using C language in Linux) give me approximately 9.3 GFLOPS of performance, judging by the reported [MIC Time] numbers when I set the environment variable  OFFLOAD_REPORT to 1 (or 2). This is about 0.46% of the advertized peak performance of 2 TFLOPS. But in fact, it is much less than that if I take into account the time for the data movement inside the offload section [CPU Time in the "report").

Am I missing something?

I am curious to know if my numbers are way off or consistent with other benchmarks (I could not find any).

I would appreciate it if someone could point me to related information or to know if someone had a different (or similar) experience.

The bottom line is that I hope I need to do something to drastically improve its performance, but I ran out of ideas. Any help will be appreciated.

Thanks!

Fernando

 

0 Kudos
1 Reply
Zhang_Z_Intel
Employee
590 Views

What type of FFT are you doing? data sizes? Are you offloading individual FFTs one-by-one, or batch FFTs?

See this knowledge base article for getting good FFT performance on MIC: https://software.intel.com/en-us/articles/tuning-the-intel-mkl-dft-functions-performance-on-intel-xeon-phi-coprocessors

And where do you get the info that the peak performance of Phi is 2 TFLOPS? As far as I know, the high-end Intel Xeon Phi 7120 with 61 cores has only a peak performance about 1.2 TFLOPS. FFT is typically a memory bound computation. It's not reasonable to expect FFT performance to be close to theoretical peak. If you run something like Linpack benchmark or matrix-matrix multiplication then you'll be able to get much closer to the peak.

 

0 Kudos
Reply