Using intel mkl fft on a cluster

Telemachus · ‎05-04-2021

Hi there. Sorry for reposting this, I was advised there is this specific section for MKL.

I have some questions on linking using the fft mkl libraries on a cluster. I have a code which I have written using the open source FFTW, but now in the cluster I am trying to use the mkl fftw. I've been looking at the examples in the folder dftf, and tried to adapt this into my code.

I would like to clarify some things.
The first thing is, what is called "hand" in the mkl libraries is what in the classic FFTW is called a plan? is it the same thing?

As I am going to make intensive use of this FFT, I committed the transforms values in some subroutine in the code, and shared the "hand" pointers through a common block to the routine where I actually execute the DftiComputeForward and DftiComputeBackward calls. Is this correct?

Now, the thing is that the FFT calculations doesn't seem to be executed in the optimal time that I obtain when I execute these transforms using the open source FFTW on my computer. I have to say that in my desktop computer I don't have the intel compiler, I use the gfortran compiler, so the comparison is between different systems, and different compilers. It is hard to say on an absolute scale which is working better, but what I do see is that the FFTW scaling goes as expected, like order N times Log_2(N) on my computer, and that is what I am not obtaining when I run the code in the cluster.

I know that the intel compiler should be working better than the open source versions, so I would like to know what could be the root of this lose of performance. It is possible that I am not using the mkl routines properly, I'm just trying to learn how to use them. It is possible that I am not compiling the code in the way I should to get the optimization performance I expected.

The way I compiled the code, given that I didn't know how to do it, was looking on how the "examples" provided by intel compiled using the makefile given there. I also tried by including my code in the list file provided with the examples, and then running using the makefile. And it worked, but the thing is if there could be some issue on how these codes are being compiled that is giving me a lose in performance.

Now I'm compiling using something like:

mpiifort -module _results/intel_lp64_parallel_iomp5_libintel64 -I/opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/include -fpp -qopenmp \
mycode.f90 \
_results/intel_lp64_parallel_iomp5_libintel64/mkl_dfti.o \
/opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/lib/intel64/libmkl_intel_lp64.a -Wl,--start-group /opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/lib/intel64/libmkl_intel_thread.a /opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/lib/intel64/libmkl_core.a -Wl,--end-group \
-L/opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/../compiler/lib/intel64 -liomp5 -lpthread -lm -ldl -o _results/intel_lp64_parallel_iomp5_libintel64/mycode.x

Is there a way to do it more concisely? I am also using MPI, I don't know if that matters.

Thanks.

Telemachus · ‎05-04-2021

MKL version: intel/mkl/2021.1.1
OS version: NAME="CentOS Linux"
VERSION="7 (Core)"
Compiler version: intel/compiler-rt/2021.1.1

mpi/2021.1.1

I'll prepare a small test code to post it here in a few days. I have to do a few things today, so I'll leave it for tomorrow.

MRajesh_intel · ‎05-04-2021

Hi,

Thanks for posting your query. Alternatively you may also refer to the link advisor tool to see which libraries are recommended for a particular use case.

Link:https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html

Regards

Rajesh.

MRajesh_intel · ‎05-10-2021

Hi,

Could you please share an update regarding the issue. If your issue is not yet resolved, please share the minimal reproducer.

Regards

Rajesh.

Telemachus · ‎05-11-2021

Hi. I have partially fixed the problem. I wasn't using correctly the job manager at the cluster, which caused the major part of the issue. However, I am compiling the code with MKL by using the directories that appeared when I run the makefile for the examples provided by intel in the folder dftf, as I explained before. The way I am compiling it is using the command:

mpiifort -module _results/intel_lp64_parallel_iomp5_libintel64 -I/opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/include -fpp -qopenmp \
mycode.f90 \
_results/intel_lp64_parallel_iomp5_libintel64/mkl_dfti.o \
/opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/lib/intel64/libmkl_intel_lp64.a -Wl,--start-group /opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/lib/intel64/libmkl_intel_thread.a /opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/lib/intel64/libmkl_core.a -Wl,--end-group \
-L/opt/ohpc/pub/compiler/intel/compilers_and_libraries_2020.2.254/linux/mkl/2021.1.1/../compiler/lib/intel64 -liomp5 -lpthread -lm -ldl -o _results/intel_lp64_parallel_iomp5_libintel64/mycode.x

I'm not sure this is the best practice, if I should use any other compilation flags, or if there is a better way of doing it. However, if there shouldn't be any detriments in the compilation process, that wouldn't bother me, but I'm still not sure if I am getting the best possible performance. In particular, I still see some better performance by running on my local desktop (better parallel performance), but as I explained before, I am using a different system, with different cpu and the gnu compiler, so it is difficult to trace a parallelism between these comparisons.

Thanks.

MRajesh_intel · ‎05-12-2021

Hi,

>>"I still see some better performance by running on my local desktop (better parallel performance)"

Could you please elaborate more on this? Also, provide details of the different CPU/local desktop details that you are using.

>> "I'll prepare a small test code to post it here in a few days. "

If possible, could you also share the minimal reproducer?

Regards

Rajesh

Telemachus · ‎05-13-2021

Hi. I think everything is working fine now, I wasn't giving proper use of the job manager. Thanks.

MRajesh_intel · ‎05-16-2021

Hi,

Thanks for the confirmation!

As this issue has been resolved, we will no longer respond to this thread. If you require any additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Have a Good day.

Regards

Rajesh.