Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

3D FFT in MKL with data larger than cache

Dylan_B_
Beginner
734 Views

Hi, 

I am working on a 3D numerical integrator for a non-linear PDE using the parallel FFT library included in MKL. 

My arrays consist of 2^30 data points which is much much larger than the cache. This results in ~50% of cache references being misses leading to a massive amount of execution time being purely accessing memory.

Is there a clever way I can deal with this? Is it expected to have 50% cache misses using an array this large?

Any help would be much appreciated.

Thanks,

Dylan

0 Kudos
3 Replies
Evgueni_P_Intel
Employee
734 Views

Hi Dilan B.,

Cache-miss rate of 50% is OK for large out-of-place FFTs. Did you try in-place 3D transforms?

For most data points of large 3D transforms, the miss-hit pattern is MHHHMH for in-place transforms and MMHHMH for out-of-place transforms -- 33% and 50% cache-miss rate. Though real figures may be higher, switching to in-lpace transforms may improve performance.

Evgueni.

0 Kudos
Dylan_B_
Beginner
734 Views

Hi Evgueni,

Thanks for the prompt reply.

I tried using in-place transforms and it improved the cache miss rate by approximately 5% compared to out-of-place transforms. I am still finding my performance underwhelming compared to a solver using FFTW3 I had written in the past and I am completely stumped on how or if I can further increase performance

I have also noticed that certain runs can have a cache miss rate of as high as 65% with no changing of parameters in my source code.

 

Thanks for your help,

Dylan

 

0 Kudos
Evgueni_P_Intel
Employee
734 Views

FFT performance may depend on the layout of the dataset in the memory, threading runtime settings, etc.

To speedup investigation, please post a reproducer here or send it privately.

 

0 Kudos
Reply