Missing rfftw function

Miska · ‎04-07-2013

Hi !

I am steadily progressing in my quest to convert from a gcc to an icc (and MKL) based build of my code. The reason for the shift is to be able to run my code on the Xeon Phi.

I have managed to get the thing to compile. However, when I run the code, I get an error:

fftw_die: rfftw() is not implemented because MKL DFTI doesn't support half-complex data layout

Looks like I have hit a function that is not supported in MKL. I have an option to avoid the branch calling those functions (I basically pre-generate the needed stuff and reload when using the Phi) so I can already run a limited version of the code, but that is not a solution on the long term. So what are my options ?

- Option 1: Get FFTW (I am using v.2, because I need the MPI-parallellized functions) to compile on the phi. I tried to force FFTW to cross-compile, but that didn't work (I am not a cross-compiling/configure guru though). Since FFTW is pretty clever at optimizing itself at compile time, I suppose that getting it to compile for Phi is not trivial. Has anybody succeeded ? Any "magic lines" that would work to get a running FFTW v2 on the Phi ? At this point, I just need functionning rfftw functions, the others would be taken from MKL. Performance is not really an issue, as the rfftw functions are just called at an initialization stage of the code.

- Option 2: replace the rfftw functions by supported functions. Unfortunately, the code part using rfftw is in C++ (I don't do C++ at all, so I would have to learn at least the basics), and I have no idea how te code internals work (it wasn't written by me, I just use it as a library - of course I understand the final result produced by it, just not it's internal workings). Fortunately (?), rfftw don't seem to be used widely in that library. When I do a grep of rfftw, only a few hits appear. This makes me think replacing the function by something else would not be such a daunting task. The grep appearances are:

in file fftw.cc:

[...]

p = rfftw_create_plan(N, FFTW_REAL_TO_COMPLEX, FFTW_MEASURE | FFTW_USE_WISDOM); \

[...]
p = rfftw2d_create_plan(N, M, FFTW_REAL_TO_COMPLEX, FFTW_MEASURE | FFTW_USE_WISDOM);

And in another file, called fftw.h:

[...]

rfftw_destroy_plan(p);
[...]

rfftw_plan p;

[...]

inline void fft::run(fftw_real *in, int pp) {
rfftw_one(p, in, out + pp*N);
}

[...]

/*! fftw plan generation */
void make_plan();

int
N, /*!< number of rows */
M, /*!< number of columns */
out_length; /*!< fft2d output length */
rfftwnd_plan p; /*!< fftw plan for fft2d computation */
fftw_complex *out; /*!< the output of the fft2d of type fftw_complex which is a structure type containing two doubles, the real part (real) and the imaginary part (im). */
};

/* computates a 2D fft of N*M elements*/
inline void fft2d::run(fftw_real *in) {
run(in, 0);
}
/* computes the ppth 2D fft of N*M elements*/
inline void fft2d::run(fftw_real *in, int pp) {
rfftwnd_one_real_to_complex(p, in, out + pp*out_length);
}

And that's about it. I am not really about the plan creation and destruction functions as those are basically ignored by MKL (right ?). So I would just need to change a couple of functions (and first figure out what they do).

So, for now, I am rooting for Option 1, as that would involve little work (I hope). But option 2 doesn't sound completely undoable.

What do you experts think ?

Thanks in advance !

Miska

Dmitry_B_Intel · ‎04-08-2013

Hi Miska,

Rfftw functionality of FFTW2 interface in MKL is not implemented indeed. Let me consider this your report as a request to implement this part of the interface.

For the options you have listed, option 1 seems to need no programming at all, just tell the configure that you are cross-compiling, and add -mmic option to CFLAGS and that is it. However it may produce too slow code. For large transforms, you may need to replace FFTW_MEASURE flag with FFTW_ESTIMATE then, especially if computation of the FFT is not in performance critical part of your application.

I can suggest a few more options: (a) compute real-to-complex fft using FFTW2 complex-to-complex functions, (b) use FFTW3 interface for this computation (this interface is built-in in MKL and it supports real-to-complex transforms).

Thanks
Dima

Miska · ‎04-08-2013

Hello,

Thanks very much for your input. A few points:

- Yes, it would be usefull to have full support of all the FFTW2 libraries. I know it's a legacy interface, but the FFTW3 does not implement everything (and in particular the MPI-based FFTs are missing in FFTW3).

- I cannot get FFTW2 to compile, despite spending a bit of time on it. Googling around suggested the following:

export CC=/opt/intel/composer_xe_2013.3.163/bin/intel64/icc
export CXX=/opt/intel/composer_xe_2013.3.163/bin/intel64/icpc
export FC=/opt/intel/composer_xe_2013.3.163/bin/intel64/ifort
export CFLAGS='-mmic -O3'
export CXXFLAGS='-mmic -O3'
export F90FLAGS='-mmic -O3'

./configure --prefix=/home/miska/fftw-2.1.5_icc_mic --enable-type-prefix --enable-mpi --host=blackfin

bash-4.1$ ./configure --prefix=/home/miska/fftw-2.1.5_icc_mic --enable-type-prefix --enable-mpi --host=blackfin
configure: WARNING: If you wanted to set the --build type, don't use --host.
If a cross compiler is detected then cross compile mode will be used.
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for blackfin-strip... no
checking for strip... strip
checking type prefix for installed files... d
checking for vendor's cc to be used instead of gcc... checking for cc... /opt/intel/composer_xe_2013.3.163/bin/intel64/icc
checking for blackfin-gcc... (cached) /opt/intel/composer_xe_2013.3.163/bin/intel64/icc
checking for C compiler default output... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... yes
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /opt/intel/composer_xe_2013.3.163/bin/intel64/icc accepts -g... yes
checking for /opt/intel/composer_xe_2013.3.163/bin/intel64/icc option to accept ANSI C... none needed
checking for style of include used by make... GNU
checking dependency style of /opt/intel/composer_xe_2013.3.163/bin/intel64/icc... gcc3
checking whether we are using gcc 2.90 or later... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking whether make sets $(MAKE)... (cached) yes
checking for blackfin-ranlib... no
checking for ranlib... ranlib
checking whether ln -s works... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... Invalid configuration `blackfin': machine `blackfin' not recognized
configure: error: /bin/sh ./config.sub blackfin failed

I found the --host=blackfin option on the web, in a document reporting that they were successfull in compiling FFTW 3.3 (see Schultz et al., "Early Experiences Porting Scientic Applications to the Many Integrated Core (MIC) Platform"). However, for me, trying other host types also fails at the configure stage, and mmic is not directly supported (as an option).

I think compiling FFTW2 directly should be possible, one probably just needs to find a way to bypass the configure checks - and probably the compile time self optimizations.

- Finally: is it possible to use, at the asme time FFTW2 and FFT3 interfaces ?
Any input is welcome :-)

Thanks !

Miska

Dmitry_B_Intel · ‎04-09-2013

Miska,

FFTW2 wrappers that MKL provides allow you compute real-to-complex transform using rfftwnd (nd = n-dimensional) set of functions, using rank=1. One-dimensional r2c transform computed with rfftw function employs a different data layout (called 'halfcomplex', see details in FFTW2 documentation, section 3.1). Given that rfftwnd rank=1 is already supported, do youneed support for the halfcomplex format indeed?

When configuring FFTW2 for build, you should simply specify a recognizable --host. For me this worked: [ ./configure --enable-type-prefix --enable-mpi --enable-threads --with-openmp --host=x86_64-linux CC=icc CFLAGS="-mmic -O3 -openmp" F77=ifort FFLAGS="-mmic -O3 -openmp" MPICC=mpiicc ]. Since Intel Xeon Phi is a number of small cores, you may get better results by running MPI-hybrid mode than 1 process per core. That is why I suggest you to use OpenMP when configuring FFTW2 for Intel Xeon Phi. Configuring --with-openmp is not enough, of course, but is necessary for threaded FFTW2 on Phi.

Please also note that FFTW3.3.x does support MPI, and MKL provides fftw3-mpi wrappers as well.

About combining fftw2 and fftw3 API (application program interface). The two APIs are mostly different, but they share some names (example: fftw_destroy_plan), which may cause problems when one application uses both APIs. MKL wrappers take care to make the shared names behave correctly, but this won't help if the application uses genuine fftw2.

Thanks
Dima

Miska · ‎04-10-2013

Hello Dima,

Thank you very much, this is very helpful. Your compile options seem to work. I will try this first.

I am still discovering the Phi, and mixing MPI and OpenMP is on my list to get better performance, so I'll try that too...

Miska

Miska · ‎04-10-2013

Oh, and about the FFTW2 support. I understand that I could simply change the code to be MKL compatible - all the necessary transforms are there. However, I think it would be much easier (especially in the case of legacy code), if you could simply replace FFTW2 with MKL, without having to touch at all the code. As it is now, this is not possible, since a few functions are formally missing.

So yes, I would like to have a full FFTW2 implementation in MKL, even though I understand one can do everything one needs (in principle) with the current MKL.

I hope this makes sense.

Miska