Evarist, Ying

Even_A_ · ‎06-30-2014

Hello,

I have been assigned the task of converting a matlab script to C++, and am currently working on the FFT part.

Given this input:

octave:67> r57
r57 =

   0.00000   0.20000   0.30000   0.30000   0.40000
   0.30000   0.30000   0.40000   0.40000   0.50000
   0.10000   0.30000   0.20000   0.40000   0.10000
   0.50000   0.50000   0.40000   0.30000   0.20000
   0.20000   0.20000   0.20000   0.20000   0.20000
   0.30000   0.20000   0.30000   0.20000   0.30000
   0.50000   0.50000   0.50000   0.50000   0.50000

I get this output:

octave:68> fft2(r57)
ans =

   10.90000 + 0.00000i   -0.46180 - 0.00000i   -0.23820 - 0.00000i   -0.23820 + 0.00000i   -0.46180 + 0.00000i
    0.79650 + 0.27359i   -0.55720 + 0.94400i   -0.89353 + 0.39737i   -0.10671 - 0.19919i   -0.34353 - 0.30982i
   -0.13330 + 1.20183i    0.50836 + 0.04556i    0.18356 + 0.35590i   -0.57328 - 0.00290i   -0.49515 + 0.11341i
   -1.91320 - 0.77347i   -0.54307 - 0.27387i    0.09683 - 0.23803i   -0.56867 - 0.10558i   -0.20760 - 0.41938i
   -1.91320 + 0.77347i   -0.20760 + 0.41938i   -0.56867 + 0.10558i    0.09683 + 0.23803i   -0.54307 + 0.27387i
   -0.13330 - 1.20183i   -0.49515 - 0.11341i   -0.57328 + 0.00290i    0.18356 - 0.35590i    0.50836 - 0.04556i
    0.79650 - 0.27359i   -0.34353 + 0.30982i   -0.10671 + 0.19919i   -0.89353 - 0.39737i   -0.55720 - 0.94400i

And now I am confused. The first column is as expected, with the second half computable from the 1st half. However, this is not the case for the remaining columns. The C++ code follows, but it doesn't work, among other things it assumes that the second half of the columns can be computed from the first half.

Not sure how to proceed, Is the octave (matlab clone) output correct/expected/common ? If not, if there is a problem with the test data, what would be more reasonable to test with ?

Any advice on how to handle this situation would be much appreciated. Any comments on the code also, I am not really sure that it is correct

Kind Regards,

MKL_LONG               mklNumSamples   [2] = { 5 /* columns */, 7 /* rows */};
MKL_LONG               mklInputStrides [3] = { 0, mklNumSamples[1] + 2, 1};
MKL_LONG               mklOutputStrides[3] = { 0, mklNumSamples[1] + 2, 1};
float                  mklBuffer       [(5+2)*(7+2)] = { 0 };

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();
DftiCreateDescriptor(&dftiHandle, DFTI_SINGLE, DFTI_REAL, 2, mklNumSamples);

DftiSetValue(dftiHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);

DftiSetValue(dftiHandle, DFTI_INPUT_STRIDES, mklInputStrides);

DftiSetValue(dftiHandle, DFTI_OUTPUT_STRIDES, mklOutputStrides);

DftiCommitDescriptor(dftiHandle);

{
        /* Input is f, accessed as: f.size(), f[index] */

        size_t index[2] = { 0, 0 };
        for(index[0] = 0; index[0] < f.size()[0]; ++index[0]) /* column */
        {
            for(index[1] = 0; index[1] < f.size()[1]; ++index[1]) /* row */
            {
                mklBuffer[ index[0] * (f.size()[1]+2) + index[1]] = f[index]; /* fetches, f with column = index[0] and row = index[1]
            }
        }
    }

DftiComputeForward(dftiHandle, &mkliBuffer[0]);

{
        size_t index[2] = {0, 0 };
        for(index[0] = 0; index[0] < f.size()[0]; ++index[0])
        {
            for(index[1] = 0; index[1] < (f.size()[1] + 1)/2; ++index[1])
            {
                const PUVec2st index0(index[0], index[1]);
                const PUVec2st index1(index[0], f.size()[1] - index[1]);

                const float real =
                    mklBuffer[ index[0] * (f.size()[1] + 2) + index[1]*2+0];
                const float imag =
                    mklBuffer[ index[0] * (f.size()[1] + 2) + index[1]*2+1];

                if (index[1] == 0)
                {
                    result[index0] = std::complex<float>(real, imag);
                }
                else
                {
                    result[index0] = std::complex<float>(real, imag);
                    result[index1] = std::complex<float>(real, -imag);
                }
            }
        }
    }

Evarist_F_Intel · ‎06-30-2014

Hi Even,

I believe octave output is correct. The second half of output columns should be computable from their first halves. The property of real-to-complex transform for 2D is the following: out(i1,i2) = conj(out(N-i1,N-i2)). Octave output satisfies this property. Please look at MKL DFTI docs for more information.

In C++ code there is also a problem with output strides (the are specified in output element's type, i.e. complex in your case). It should be something like

MKL_LONG               mklNumSamples   [2] = { 5 /* columns */, 7 /* rows */};
MKL_LONG               mklInputStrides [3] = { 0,  mklNumSamples[1] + 1,    1};
MKL_LONG               mklOutputStrides[3] = { 0, (mklNumSamples[1] + 1)/2, 1};
float                  mklBuffer       [5*(7+1)] = { 0 };

Please look at basic_sp_real_dft_2d.c example from MKL (or online) for more details.

Ying_H_Intel · ‎06-30-2014

Thanks Evarist much.

Right, the output should follow the out(i1,i2) = conj(out(N-i1,N-i2)).

and with the basic_sp_real_dft_2d.c, the parameter should be

int N1 = 7, N2 = 5;

float x_real[7*5]

output : x_cmplx = (MKL_Complex8*)malloc(N1*(N2/2+1);*sizeof(MKL_Complex8));

and input strides = {0, 5, 1}
output strides = {0, 3, 1}

Then fill the output with the out(i1,i2) = conj(out(N-i1,N-i2)), you will see same result of Octave.

Best Regards,

Ying

P.S the result of MKL code:

Forward-Backward single-precision 2D real out-of-place FFT
Configuration parameters:
DFTI_PRECISION = DFTI_SINGLE
DFTI_FORWARD_DOMAIN = DFTI_REAL
DFTI_DIMENSION = 2
DFTI_LENGTHS = {7, 5}
DFTI_PLACEMENT = DFTI_NOT_INPLACE
DFTI_CONJUGATE_EVEN_STORAGE = DFTI_COMPLEX_COMPLEX
Create DFTI descriptor
Set configuration: out-of-place
Set configuration: CCE storage
Set input strides = {0, 5, 1}
Set output strides = {0, 3, 1}
Commit the descriptor
Allocate data arrays
Initialize data for r2c transform
Compute real-to-complex transform
Verify the result
(10.900001, 0.000000) (-0.461803, -0.000000) (-0.238197, -0.000000)
(0.796495, 0.273589) (-0.557197, 0.943995) (-0.893533, 0.397367)
(-0.133297, 1.201828) (0.508356, 0.045556) (0.183562, 0.355897)
(-1.913198, -0.773471) (-0.543072, -0.273873) (0.096825, -0.238034)
(-1.913198, 0.773471) (-0.207605, 0.419384) (-0.568669, 0.105578)
(-0.133297, -1.201828) (-0.495148, -0.113406) (-0.573280, 0.002897)
(0.796495, -0.273589) (-0.343530, 0.309822) (-0.106708, 0.199195)
Free DFTI descriptor
Free data arrays
TEST PASSED
Press any key to continue . . .

Even_A_ · ‎07-02-2014

Evarist, Ying

Thanks alot that helped.

Now octave and myself are in argement for the fft2 code, however there is still something wrt the inverse transform that I don't understand.

The matlab script I am working on, sets the lower half of each column to 0, and then computes the inverse transform, however the inverse code does not look at all data, essentially it only considers the left-half of the data before passing it to mkl.

For this data:

octave:167> fft_s_X
fft_s_X =

Columns 1 and 2:

   9.000000000000000 + 0.000000000000000i -1.118033988749895 - 0.363271264002680i
   5.000000000000000 - 4.000000000000000i -0.363271264002680 + 1.118033988749895i
   0.000000000000000 + 0.000000000000000i   0.000000000000000 + 0.000000000000000i
   0.000000000000000 + 0.000000000000000i   0.000000000000000 + 0.000000000000000i

Columns 3 and 4:

-2.118033988749895 - 1.538841768587627i   1.118033988749895 + 1.538841768587627i
-1.538841768587627 + 2.118033988749895i   1.538841768587627 - 1.118033988749895i
   0.000000000000000 + 0.000000000000000i   0.000000000000000 + 0.000000000000000i
   0.000000000000000 + 0.000000000000000i   0.000000000000000 + 0.000000000000000i

Column 5:

   0.118033988749895 + 0.363271264002680i
   0.363271264002680 - 0.118033988749895i
   0.000000000000000 + 0.000000000000000i
   0.000000000000000 + 0.000000000000000i

octave:170> real(ifft2(fft_s_X))
ans =

Columns 1 through 4:

   0.6000000000000000   0.6961158231412373   0.7175570504584946   0.7324429495415055
   0.4500000000000000   0.9500000000000000   0.4499999999999999   0.9500000000000000
   0.1000000000000000   0.5038841768587627 -0.0175570504584947   0.4675570504584947
   0.2500000000000000   0.2500000000000000   0.2500000000000000   0.2500000000000000

Column 5:

   0.7538841768587627
   0.4500000000000000
-0.0538841768587627
   0.2500000000000000

I get this result:

   0.436182, 1.09425, 0.967557, 0.946116, 1.3059
   0.202786, 1.37361, 0.65    , 1.15    , 0.873607
-0.183395, 0.32936, -0.267557, 0.253884, -0.382292
   0.05    , 0.05   , 0.05    , 0.05    , 0.05

enum { numColumns = 5 };
enum { numRows    = 4 };

MKL_LONG               mklNumSamples[2]  = { numRows, numColumns };
MKL_LONG               mklInputStrides [3]   = { 0, mklNumColumns/2 + 1, 1};
MKL_LONG               mklOutputStrides[3] = { 0, mklNumColumns          , 1};

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();

DftiCreateDescriptor(&dftiHandle, DFTI_SINGLE, DFTI_REAL, 2, mklNumSamples);
DftiSetValue              (dftiHandle,     DFTI_PLACEMENT, DFTI_NOT_INPLACE);
DftiSetValue             (dftiHandle,      DFTI_CONJUGATE_EVEN_STORAGE,  DFTI_COMPLEX_COMPLEX);
DftiSetValue             (dftiHandle,      DFTI_INPUT_STRIDES,      mklInputStrides);
DftiSetValue             (dftiHandle,      DFTI_OUTPUT_STRIDES,  mklOutputStrides);
DftiCommitDescriptor(dftiHandle);

float input  [numRows * (numColumns / 2 + 1) * 2] = { 0 } ;
float output[numColumns * numRows] = { 0 };

{
        const float divisor = 1.0 / (numRows * numColumns)


        //
        // This can't possibly work, we only look at the left half of the input
       //
        for(int row = 0; row < numRows; ++row)
        {
            for(int column = 0; column < numColumns/2 + 1, ++column(
            {
                  const std::complex<float> c = get_input(column, row);                    // not shown, fetches the complex number at the given cell
                  input[(row * (numColumns / 2 + 1) + column)*2+0] = std::real(c) * divisor;
                  input[(row * (numColumns / 2 + 1) + column)*2+1] = std::imag(c) * divisor;
            }
        }
    }

DftiComputeBackward(dftiHandle, input, output)

         // This should be ok
        for(int row = 0; row < numRows; ++row)
        {
            for(int column = 0; column < numColumns, ++column)
            {
                   const float value = output[row * numColumns + column]
                   set_output(value, column, row);   // not shown
            }
       }

Even_A_ · ‎07-02-2014

Ok,

I had a look at

http://stackoverflow.com/questions/7811197/how-to-use-inverse-fft-on-amplitude-frequency-response

Which says I should increase the input, makes somewhat sense but I can't get it to work :-(

enum { numColumns = 5 };

enum { numRows = 4 };

MKL_LONG mklNumSamples [2] = { numRows, numColumns*2 - 1 };

...

I can post the complete code if required. Would it be better to emulate an ifft2 by a complex <-> complex fft2 ? Will performance be worse ? Which settings do I need for this.

Regards,

Even

Ying_H_Intel · ‎07-03-2014

Hi Even,

What is the required input format of ifft? the up-half data or left-half data?

In MKL example, we use left-half data of x_cmplx. and the complete code is as below.

There are two ways as you see,

1. the data is x_cmplx=x_cmplx[N1-i][N2-j]T , so if you have up-half data, you may need to convert them to a left_half data according to the formula manually (preferable way)

2. Perform C2C transform instead of R2C which will produce the whole output array (not preferable due to performance reasons).

Best Regards,

Ying

/*****************************************************************************
! Copyright(C) 2011-2014 Intel Corporation. All Rights Reserved.
!
! The source code, information  and  material ("Material") contained herein is
! owned  by Intel Corporation or its suppliers or licensors, and title to such
! Material remains  with Intel Corporation  or its suppliers or licensors. The
! Material  contains proprietary information  of  Intel or  its  suppliers and
! licensors. The  Material is protected by worldwide copyright laws and treaty
! provisions. No  part  of  the  Material  may  be  used,  copied, reproduced,
! modified, published, uploaded, posted, transmitted, distributed or disclosed
! in any way  without Intel's  prior  express written  permission. No  license
! under  any patent, copyright  or  other intellectual property rights  in the
! Material  is  granted  to  or  conferred  upon  you,  either  expressly,  by
! implication, inducement,  estoppel or  otherwise.  Any  license  under  such
! intellectual  property  rights must  be express  and  approved  by  Intel in
! writing.
!
! *Third Party trademarks are the property of their respective owners.
!
! Unless otherwise  agreed  by Intel  in writing, you may not remove  or alter
! this  notice or  any other notice embedded  in Materials by Intel or Intel's
! suppliers or licensors in any way.
!
!*****************************************************************************
! Content:
! A simple example of single-precision real-to-complex out-of-place 2D
! FFT using MKL DFTI
!
!****************************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>
#include "mkl_dfti.h"

static void init_r(float *x, int N1, int N2, int H1, int H2);
static void init_c(MKL_Complex8 *x, int N1, int N2, int H1, int H2);
static int verify_c(MKL_Complex8 *x, int N1, int N2, int H1, int H2);
static int verify_r(float *x, int N1, int N2, int H1, int H2);

/* Define the format to printf MKL_LONG values */
#if !defined(MKL_ILP64)
#define LI "%li"
#else
#define LI "%lli"
#endif

int main(void)
{
    /* Size of 2D transform */
    int N1 = 7, N2 = 5;

    /* Arbitrary harmonic used to verify FFT */
    int H1 = -1, H2 = 2;

    /* Execution status */
    MKL_LONG status = 0;

    /* Pointers to input and output data */
    float x_real[7*5] = {0.00000,   0.20000,   0.30000 ,  0.30000 ,  0.40000,
   0.30000,   0.30000,   0.40000 ,  0.40000 ,  0.50000,
   0.10000,   0.30000,   0.20000 ,  0.40000,   0.10000,
   0.50000,   0.50000,   0.40000 ,  0.30000,   0.20000,
   0.20000 ,  0.20000,   0.20000,   0.20000,   0.20000,
   0.30000,   0.20000,   0.30000 ,  0.20000 ,  0.30000,
   0.50000,   0.50000 ,  0.50000 ,  0.50000,   0.50000 };	
		
	MKL_Complex8 *x_cmplx = 0;

    DFTI_DESCRIPTOR_HANDLE hand = 0;

    char version[DFTI_VERSION_LENGTH];

    DftiGetValue(0, DFTI_VERSION, version);
    printf("%s\n", version);

    printf("Example basic_sp_real_dft_2d\n");
    printf("Forward-Backward single-precision 2D real out-of-place FFT\n");
    printf("Configuration parameters:\n");
    printf(" DFTI_PRECISION                = DFTI_SINGLE\n");
    printf(" DFTI_FORWARD_DOMAIN           = DFTI_REAL\n");
    printf(" DFTI_DIMENSION                = 2\n");
    printf(" DFTI_LENGTHS                  = {%i, %i}\n", N1, N2);
    printf(" DFTI_PLACEMENT                = DFTI_NOT_INPLACE\n");
    printf(" DFTI_CONJUGATE_EVEN_STORAGE   = DFTI_COMPLEX_COMPLEX\n");
    /*printf(" DFTI_PACKED_FORMAT            = DFTI_CCE_FORMAT\n");*/

    printf("Create DFTI descriptor\n");
    {
        MKL_LONG N[2];  N[0] = N1;  N[1] = N2;
        status = DftiCreateDescriptor(&hand, DFTI_SINGLE, DFTI_REAL, 2, N);
        if (0 != status) goto failed;
    }

    printf("Set configuration: out-of-place\n");
    status = DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    if (0 != status) goto failed;

    printf("Set configuration: CCE storage\n");
    status = DftiSetValue(hand, DFTI_CONJUGATE_EVEN_STORAGE,
                          DFTI_COMPLEX_COMPLEX);
    if (0 != status) goto failed;

    /* This is not needed for DFTI_COMPLEX_COMPLEX storage */
    /* status = DftiSetValue(hand, DFTI_PACKED_FORMAT, DFTI_CCE_FORMAT); */
    /* if (0 != status) goto failed; */
	int Neven= (N2/2+1);
    printf("Set input  strides = ");
    {
        MKL_LONG rs[3]; rs[0] = 0; rs[1] = N2; rs[2] = 1;
        printf("{"LI", "LI", "LI"}\n", rs[0],rs[1],rs[2]);
        status = DftiSetValue(hand, DFTI_INPUT_STRIDES, rs);
        if (0 != status) goto failed;
    }

    printf("Set output strides = ");
    {
        MKL_LONG cs[3]; cs[0] = 0; cs[1] = Neven; cs[2] = 1;
        printf("{"LI", "LI", "LI"}\n", cs[0],cs[1],cs[2]);
        status = DftiSetValue(hand, DFTI_OUTPUT_STRIDES, cs);
        if (0 != status) goto failed;
    }

    printf("Commit the descriptor\n");
    status = DftiCommitDescriptor(hand);
    if (0 != status) goto failed;

   printf("Allocate data arrays\n");
  //  x_real = (float *)malloc(N1*N2*sizeof(float));
   // if (0 == x_real) goto failed;
   //int Neven= (N2/2+1);
   
    x_cmplx = (MKL_Complex8*)malloc(N1*Neven*sizeof(MKL_Complex8));
    if (0 == x_cmplx) goto failed;

    printf("Initialize data for r2c transform\n");
    //init_r(x_real, N1, N2, H1, H2);

    printf("Compute real-to-complex transform\n");
    status = DftiComputeForward(hand, x_real, x_cmplx);
    if (0 != status) goto failed;
 
    printf("Verify the result\n");
	for (int i=0; i<N1; i++){
		for (int j=0; j<Neven; j++)
			printf ( "(%f, %f)\t ", x_cmplx[i*(N2/2+1)+j].real,x_cmplx[i*(N2/2+1)+j].imag);
	printf("\n");
	}

  /*  status = verify_c(x_cmplx, N1, N2, H1, H2);
    if (0 != status) goto failed;
  */
   printf("Reconfigure DFTI descriptor for backward transform\n");

 
  // Set Scale number for Backward transform
    printf("Set scale ");
    {
    float Scale = 1.0/(float)(N1*N2);
    printf(" \n\n DFTI_BACKWARD_SCALE  = 1/(N1*N2) \n");

    status = DftiSetValue(hand, DFTI_BACKWARD_SCALE, Scale);
    if (0 != status) goto failed;
	}

    printf("Set input  strides = ");
    {
        MKL_LONG cs[3]; cs[0] = 0; cs[1] = N2/2+1; cs[2] = 1;
        printf("{"LI", "LI", "LI"}\n", cs[0],cs[1],cs[2]);
        status = DftiSetValue(hand, DFTI_INPUT_STRIDES, cs);
        if (0 != status) goto failed;
    }

    printf("Set output strides = ");
    {
        MKL_LONG rs[3]; rs[0] = 0; rs[1] = N2; rs[2] = 1;
        printf("{"LI", "LI", "LI"}\n", rs[0],rs[1],rs[2]);
        status = DftiSetValue(hand, DFTI_OUTPUT_STRIDES, rs);
        if (0 != status) goto failed;
    }

    printf("Commit the descriptor\n");
    status = DftiCommitDescriptor(hand);
    if (0 != status) goto failed;

   // printf("Initialize data for c2r transform\n");
   // init_c(x_cmplx, N1, N2, H1, H2);

    printf("Compute backward transform\n");
    status = DftiComputeBackward(hand, x_cmplx, x_real);
    if (0 != status) goto failed;

    printf("Verify the result of the backward transform\n");
	    printf("Verify the result\n");
	for (int i=0; i<N1; i++){
		for (int j=0; j<N2; j++)
			printf ( "%f\t", x_real[i*N2+j]);
	printf("\n");
	}

 /*   status = verify_r(x_real, N1, N2, H1, H2);
    if (0 != status) goto failed;
*/
 cleanup:

    printf("Free DFTI descriptor\n");
    DftiFreeDescriptor(&hand);

    printf("Free data arrays\n");
 //   free(x_real);
    free(x_cmplx);

    printf("TEST %s\n",0==status ? "PASSED" : "FAILED");
    return status;

 failed:
    printf(" ERROR, status = "LI"\n", status);
    status = 1;
    goto cleanup;
}

/* Compute (K*L)%M accurately */
static float moda(int K, int L, int M)
{
    return (float)(((long long)K * L) % M);
}

/* Initialize array x(N) to produce unit peaks at x(H) and x(N-H) */
static void init_r(float *x, int N1, int N2, int H1, int H2)
{
    float TWOPI = 6.2831853071795864769f, phase, factor;
    int n1, n2, S1, S2, index;

    /* Generalized strides for row-major addressing of x */
    S2 = 1;
    S1 = N2;

    factor = ((N1-H1%N1)==0 && (N2-H2%N2)==0) ? 1.0f : 2.0f;
    for (n1 = 0; n1 < N1; n1++)
    {
        for (n2 = 0; n2 < N2; n2++)
        {
            phase  = moda(n1,H1,N1) / N1;
            phase += moda(n2,H2,N2) / N2;
            index = n1*S1 + n2*S2;
            x[index] = factor * cosf( TWOPI * phase ) / (N1*N2);
        }
    }
}

/* Verify that x has unit peak at H */
static int verify_c(MKL_Complex8 *x, int N1, int N2, int H1, int H2)
{
    float err, errthr, maxerr;
    int n1, n2, S1, S2, index;

    /* Generalized strides for row-major addressing of x */
    S2 = 1;
    S1 = N2/2+1;

    /*
     * Note, this simple error bound doesn't take into account error of
     * input data
     */
    errthr = 2.5f * logf( (float)N1*N2 ) / logf(2.0f) * FLT_EPSILON;
    printf(" Check if err is below errthr %.3lg\n", errthr);

    maxerr = 0;
    for (n1 = 0; n1 < N1; n1++)
    {
        for (n2 = 0; n2 < N2/2+1; n2++)
        {
            float re_exp = 0.0f, im_exp = 0.0f, re_got, im_got;

            if ((( n1-H1)%N1==0 && ( n2-H2)%N2==0) ||
                ((-n1-H1)%N1==0 && (-n2-H2)%N2==0))
            {
                re_exp = 1;
            }

            index = n1*S1 + n2*S2;
            re_got = x[index].real;
            im_got = x[index].imag;
            err  = fabsf(re_got - re_exp) + fabsf(im_got - im_exp);
            if (err > maxerr) maxerr = err;
            if (!(err < errthr))
            {
                printf(" x[%i][%i]: ",n1,n2);
                printf(" expected (%.7g,%.7g), ",re_exp,im_exp);
                printf(" got (%.7g,%.7g), ",re_got,im_got);
                printf(" err %.3lg\n", err);
                printf(" Verification FAILED\n");
                return 1;
            }
        }
    }
    printf(" Verified,  maximum error was %.3lg\n", maxerr);
    return 0;
}

/* Initialize array x(N) to produce unit peak at x(H) */
static void init_c(MKL_Complex8 *x, int N1, int N2, int H1, int H2)
{
    float TWOPI = 6.2831853071795864769f, phase;
    int n1, n2, S1, S2, index;

    /* Generalized strides for row-major addressing of x */
    S2 = 1;
    S1 = N2/2+1;

    for (n1 = 0; n1 < N1; n1++)
    {
        for (n2 = 0; n2 < N2/2+1; n2++)
        {
            phase  = moda(n1,H1,N1) / N1;
            phase += moda(n2,H2,N2) / N2;
            index = n1*S1 + n2*S2;
            x[index].real =  cosf( TWOPI * phase ) / (N1*N2);
            x[index].imag = -sinf( TWOPI * phase ) / (N1*N2);
        }
    }
}

/* Verify that x has unit peak at H */
static int verify_r(float *x, int N1, int N2, int H1, int H2)
{
    float err, errthr, maxerr;
    int n1, n2, S1, S2, index;

    /* Generalized strides for row-major addressing of x */
    S2 = 1;
    S1 = N2;

    /*
     * Note, this simple error bound doesn't take into account error of
     * input data
     */
    errthr = 2.5f * logf( (float)N1*N2 ) / logf(2.0f) * FLT_EPSILON;
    printf(" Check if err is below errthr %.3lg\n", errthr);

    maxerr = 0;
    for (n1 = 0; n1 < N1; n1++)
    {
        for (n2 = 0; n2 < N2; n2++)
        {
            float re_exp = 0.0f, re_got;

            if ((n1-H1)%N1==0 && (n2-H2)%N2==0)
            {
                re_exp = 1;
            }

            index = n1*S1 + n2*S2;
            re_got = x[index];
            err  = fabsf(re_got - re_exp);
            if (err > maxerr) maxerr = err;
            if (!(err < errthr))
            {
                printf(" x[%i][%i]: ",n1,n2);
                printf(" expected %.7g, ",re_exp);
                printf(" got %.7g, ",re_got);
                printf(" err %.3lg\n", err);
                printf(" Verification FAILED\n");
                return 1;
            }
        }
    }
    printf(" Verified,  maximum error was %.3lg\n", maxerr);
    return 0;
}

Even_A_ · ‎07-03-2014

Ying,

My data (fft_s_X above) is complex with the lower part zero. i.e I need to convert it to conjugate even format for the DftiBackward transform to work. If I understand the example code you give, it assumes the input to DftiBackward is already conjugate even.

I am not sure how to convert my input data to conjugate even format. Apparently it is now sufficient to simply claim that my input data (fft_s_X above) is the left part of a larger grid. If you could expand on which steps are needed it would be much appreciated. I am afraid that my signal processing skills are rather limited.

I would also like to try out the alternative approach of using a complex -> complex fft transform since it might be handy eventually. I'll play with the Dfti* settings but guidance would be helpful as this is rather new territory for me.

//
// Try to convert complex data to conjugate even, but it does not work
// In principle, nothing is known about the input except that it covers
// a (4x5) grid.
//

enum { numColumns = 5 }
enum { numRows = 4 }

// Multiply numColumns and subtract 1
MKL_LONG mklNumSamples   [2] = { numRows, 2*numColumns - 1 };

// Won't compile but shows the buffer size
float input[mklNumSamples[1] * (mklNumSamples[0]/2 + 1) * 2];

        const float multiplier = 1.0 / (mklNumSamples[0] * mklNumSamples[1]);

        for(int row = 0; row < numRows; ++row) // range [0..3]
        {
           for(int column = 0; column < numColumns; ++column) // range [0..4], left half
           {
              std::complex<float> cmplx = get_value(column, row);
              float r = std::real(cmplx);
              float i = std::imag(cmplx);
              input[(row * (mklNumSamples[0]/2+1) + column) * 2 + 0] = r * multiplier;
              input[(row * (mklNumSamples[0]/2+1) + column) * 2 + 1] = i * multiplier;
           }
        }

Regards,

Even

Even_A_ · ‎07-03-2014

Ok, got the complex -> complex fft working, using plain buffers, no conjugate even format in this case, simply using DFTI_COMPLEX instead of DFTI_REAL worked fine.

Even_A_ · ‎07-03-2014

Hello,

Thought I should post a status update in case anyone else has the same problems.

I have typed this in directly, rewriting from our codebase. As such it might not compile, or even give correct results. However the code it is based on seems to work, even if it has not been extensively tested.

1D FFT, float -> complex :

// C++

// NB it seems to produce correct results, but I am not sure if this is correct for both odd and even sizes

enum { numRows = 5 };

enum { numComplexRows = (numRows + 2)/2 };

enum { DftiBufferSize = numRows + 2 }; // Must hold both input and output

//

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();

DftiCreateDescriptor( &dftiHandle, DFTI_SINGLE, DFTI_REAL, 1, numRows);

DftiCommitDescriptor(dftiHandle);

//In : float data, Out: complex data

float inOut[dftiBufferSize] = { 0 };

for(int row = 0; row < numRows; ++rows)

{

const float value = get_data(row); // fetch input from somewhere

inOut[row] = value;

}

DftiComputeForward(dftiHandle, inOut);

// Set 1st half of data

for(int row = 0; row < numComplexRows; ++row)

{

const float real = inOut[row*2 + 0];

const float imag = inOut[row*2 + 1];

set_data(row, std::complex<float>(real, imag)); // Store results somewhere

}

// Set 2nd half of data

for(int row = 1; row < numComplexRows; ++row) // Start at 1

{

const float real = inOut[row*2 + 0];

const float imag = inOut[row*2 + 1];

set_data(numRows - row, std::complex<float>(real, -imag)); // mirror index, negate imaginary part

}

1D inverse FFT, complex -> float / complex

//C++

enum { numRows = 5 };

enum { DftiBufferSize = numRows *2 };

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();

DftiCreateDescriptor( &dftiHandle, DFTI_SINGLE, DFTI_COMPLEX, 1, numRows);

DftiSetValue(dftiHandle, DFTI_BACKWARD_SCALE, 1.0f/numRows);

DftiCommitDescriptor(dftiHandle);

//In : complex data Out: float data

float inOut[DftiBufferSize] = { 0 };

for(int row = 0; row < numRows; ++rows)

{

std::complex<float> c = get_data(row);

inOut[2*row + 0] = std::real(c);

inOut[2*row + 1] = std::imag(c);

}

DftiComputeBackward(dftiHandle, inOut);

for(int row = 0; row < numRows; ++row)

{

const float real = inOut[2*row + 0];

const float imag = inOut[2*row + 1];

set_data(real, row); // Store real results

set_data(std::complex<float>(real, image), row); // or complex

}

2D FFT real -> complex

//C++

// Use conjugate even representation for complex results

enum { numCols = 5 };

enum { numComplexCols = (numCols/2) + 1 };

enum { numRows = 4} ;

//

MKL_LONG                             dftiNumSamples[2] = { numRows, numCols };

MKL_LONG                             dftiInputStrides[3] = { 0, numCols, 1};

MKL_LONG                             dftiOutputStrides[3] = { 0, numComplexCols, 1 };

//

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();

DftiCreateDescriptor(&dftiHandle, DFTI_SINGLE, DFTI_REAL, 2, dftiNumSamples);

DftiSetValue(dftiHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);

DftiSetValue(dftiHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);

DftiSetValue(dftiHandle, DFTI_INPUT_STRIDES, dftiInputStrides);

DftiSetValue(dftiHandle, DFTI_OUTPUT_STRIDES, dftiOutputStrides);

DftiCommitDescriptor(dftiHandle);

//

// set input

float input[numRows * numCols] = { 0 };

float output[numRows * numComplexCols * 2] = { 0 };

for(int row = 0; row < numRows; ++row)

{

for(int col = 0; col < numCols; ++cols)

   {

       const float value = get_data(col, row);

       input[row * numCols + col] = value;

}

}

DftiComputeForward(dftiHandle, input, output);

//

// Fetch output

for(int row = 0; row < numRows; ++row)

{

// fetch left half.

for(int col = 0; col < numComplexCols; ++col)

{

    const float real   = output[(row * numComplexCols + col)*2 + 0];

    const float imag = output[(row * numComplexCols + col)*2 + 1];

    const std::complex<float> c(real, imag);

    set_data(c, col, row);

}

// Fetch right half.

const int shift = 1 + ((numCols + 1) % 2);

for(int col = shift; col < numComplexCols; ++col)

{

    const int mirrorCol    = (numComplexCols - col) % numComplexCols; // Mirrow column

    const int mirrowRow = (numRows             - row) % numRows;           // Mirrow row

    const float real = output[(mirrorRow * numComplexCols + mirrowCol)*2 +0];

    const float imag = output[(mirrorRow * numComplexCols + mirrowCol)*2 +1];

    const std::complex<float> c(real, -imag);                                        // negate imaginary part

    set_data(c, col + numComplexCols - shift, row);                             // Set right half

}

}

2D FFT : complex -> complex

//C++

// This uses the simplest representation, both input and output are plain 2D arrays

enum { numCols = 5 };

enum { numRows = 4} ;

//

MKL_LONG                             dftiNumSamples[2] = { numRows, numCols };

MKL_LONG                             dftiInputStrides[3] = { 0, numCols, 1};

MKL_LONG                             dftiOutputStrides[3] = { 0, numCols, 1 };

//

DFTI_DESCRIPTOR_HANDLE dftiHandle = DFTI_DESCRIPTOR_HANDLE();

DftiCreateDescriptor(&dftiHandle, DFTI_SINGLE, DFTI_COMPLEX, 2, dftiNumSamples);

DftiSetValue(dftiHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);

DftiSetValue(dftiHandle, DFTI_INPUT_STRIDES, dftiInputStrides);

DftiSetValue(dftiHandle, DFTI_OUTPUT_STRIDES, dftiOutputStrides);

DftiCommitDescriptor(dftiHandle);

//

// set input

float input[numRows * numCols * 2] = { 0 };

float output[numRows * numCols * 2] = { 0 };

for(int row = 0; row < numRows; ++row)

{

for(int col = 0; col < numCols; ++cols)

   {

       const std::complex<float> c_value = get_data(col, row);

       input[(row * numCols + col)*2 + 0] = std::real(c_value);

       input[(row * numCols + col)*2 + 1] = std::imag(c_value);

}

}

DftiComputeForward(dftiHandle, input, output);

//

// Fetch output

for(int row = 0; row < numRows; ++row)

{

for(int col = 0; col < numCols; ++col)

{

    const float real   = output[(row * numCols + col)*2 + 0];

    const float imag = output[(row * numCols + col)*2 + 1];

    const std::complex<float> c(real, imag);

    set_data(c, col, row);

}

2D Inverse FFT

// complex -> float can be handled by DftiComputeBackward,

// I never figured that one out, so I emulate by

// [ From http://www.dsprelated.com/showmessage/25408/1.php ]

// A = fft (imag(x) + i * real (x));

// B = imag (A) + i * real (A);

// result = B / length (b);