I am developing a time-stepping code that calls fft routines in every step. While writing and testing the code, I used the -lfftw3 flag to link to the fftw3 library. Now that the code is functional, I tried to link to the MKL version of this library instead, as I think it may be faster. However, the result is completely different. With the -lfftw3 flag, the output seems to make sense, but not with the -mkl option. I am hoping that somebody can explain the difference. This is of great importance to me, as I often use fftw3, lapack and similar libraries, and it seems that MKL gives the best performance.
Operating system: ubuntu 13.10, but it happens on our cluster, too.
Hardware: Lenovo laptop with Intel(R) Core(TM) i7-4600U CPU, but it happens on our cluster, too.
Ifort version: Version 126.96.36.199 Build 20110811 (called through mpif90 with OMPI_FC=ifort)
> mpif90 -o test.x LES_cont.f90 -llapack -lfftw3
>mpirun -np 1 ./test.x
( ... computation ...)
>mpif90 -mkl -o test.x LES_cont.f90
>mpirun -np 1 ./test.x
Between the two runs i change only the complier/linker options as shown, nothing else. Thanks in advance for your help!
This is is fairly large research code under development, and I shouldn't like to post it in a public place. If there is a way to get it to you directly, I will. Otherwise, I could extract a minimal code that demonstrates the problem, but I have little time for that now, it would have to wait a week or two.
On my laptop, I have MKL as well as the fftw3 package provided by the Ubuntu software centre. I believe the latter includes the library file /usr/lib/x86_64-linux-gnu/libfftw3.so. If I link to this file directly, as in
mpif90 -o test.x <input files> /usr/lib/x86_64-linux-gnu/libfftw3.so
the output is correct. It is also correct if I compile by
mpif90 -mkl -o test.x <input files> -lfftw3
but with only the -mkl flag the output is wrong.
I tried to find out what library is used exaclty by using the -dryrun flag, but could not figure it out. With the -mkl flag I do see the following in the output:
< -lmkl_intel_lp64 \
< -lmkl_intel_thread \
< -lmkl_core \
< -liomp5 \
< --end-group \
which is absent without the -mkl flag.
You can send me the code by click "send the author a message"
And there are some examples that demonstrate how to use mkl fftw3 . The source code for the examples, makefiles used to run them, are located in the .\examples\fftw3xfsubdirectories in the Intel MKL directory. You may change the example based on your usage model (including your input data). Then send us by that way.
I have gotten your test case and be able to reproduce the problem. I have asked our developer to further investigated it . As holiday season, it may take some time. I will update you if any news.
We found the problem. I share it here so more developers can see it. And will send the fix code to you by private message.
you creates fftw plan (for 1d c2c fft) with the same input and output array, which means the transform is to be in-place.
However on execution stage you provides different arrays (i.e. he performs out-of-place transform). That is incorrect according to FFTW3 official documentation (Nevertheless, FFTW3 handles this situation, MKL – not).
Corresponding piece of FFTW3 documentation:
4.6 New-array Execute Functions
Normally, one executes a plan for the arrays with which the plan was created, by calling fftw_execute(plan) as described in Using Plans. However, it is possible for sophisticated users to apply a given plan to a different array using the “new-array execute” functions detailed below, provided that the following conditions are met:
o) The array size, strides, etcetera are the same (since those are set by the plan).
o) The input and output arrays are the same (in-place) or different (out-of-place) if the plan was originally created to be in-place or out-of-place, respectively.
. . .
Thank you very much for the help, it would have taken me a long time to trace this bug down. Please send me the fixed code, or simply the changed portion, which I assume comprises only a few lines.