Solved: Intel MKL DftiComputeForward how to get full transform matrix from CCE format in C

bouzas__dimitrios · ‎06-21-2017

I'm trying to implement a 2 dimensional fourier transform via use of MKL FFT functions.

I'm interested in transforming from the space domain (i.e., my input signal is a 2D MxN matrix of `double`s) to the frequency domain (i.e., a 2D MxN output matrix of complexes with double accuracy, `MKL_Complex16`) and then back to the space domain after some filtering.

Based on the examples provided by intel's MKL implementation (i.e., basic_dp_real_dft_2d.c etc.) I've created the following matlab-ish function:

    bool fft2(double *in, int m, int n, MKL_Complex16 *out) {
      bool ret(false);
      DFTI_DESCRIPTOR_HANDLE hand(NULL);
      MKL_LONG dim[2] = {m, n};
      if(!DftiCreateDescriptor(&hand, DFTI_DOUBLE, DFTI_REAL, 2, dim)) {
        if(!DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE)) {
          if(!DftiSetValue(hand, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)) {
            MKL_LONG rs[3] = {0, n, 1};
            if(!DftiSetValue(hand, DFTI_INPUT_STRIDES, rs)) {
              MKL_LONG cs[3] = {0, n / 2 + 1, 1};
              if(!DftiSetValue(hand, DFTI_OUTPUT_STRIDES, cs)) {
                if(!DftiCommitDescriptor(hand)) {
                  ret = !DftiComputeForward(hand, in, out));
                }
              }
            }
          }
        }
      }
      DftiFreeDescriptor(&hand);
      return ret;
    }

Due to the fact that I want to do some DSP stuff (e.g., Gaussian filtering) and thus I have to do matrix multiplications. I want the full transformation matrix instead of the CCE format in C matrix that DftiComputeForward outputs.

**How can I reconstruct the full transformation matrix of an arbitrary sized 2d signal (i.e., matrix) from the CCE format in C matrix that I get as output from DftiComputeForward function?**

For example if I have the following 2D real signal:

0.1, 0.2, 0.3
0.4, 0.5, 0.6
0.7, 0.8, 0.9

It's full transformation matrix would be:

4.5 + 0j, -0.45 + 0.259808j, -0.45 - 0.259808j
-1.35 + 0.779423j, 0 - 0j, 0 - 0j
-1.35 - 0.779423j, 0 + 0j, 0 + 0j

However the result from `DftiComputeForward` in CCE is:

4.5 + 0j, -0.45 + 0.259808j, -1.35 + 0.779423j,
0 - 0j, -1.35 - 0.779423j, 0 + 0j,
0 + 0j, 0 + 0j, 0 + 0j

Jing_Xu · ‎06-22-2017

Because your input is 3x3 matrix, the CCE format output has 6 values, to be:

4.5, 0.0; -0.45, 0.259808
-1.35, 0.779423; 0.0, 0.0
-1.35, -0.779423; 0.0, 0.0

Thus, rewrite them into complex format will be:

V11 V12

V21 V22

V31 V32

Since input is a 3x3 matrix, the result output should also be 3x3. Thus, the result matrix above should be in the format of:

V11 V12 ??

V21 V22 ??

V31 V32 ??

Restore it from CCE format would be:

V11 V12 conj(V12)

V21 V22 conj(V32)

V31 V32 conj(V22)

View solution in original post

Jing_Xu · ‎06-21-2017

Please check https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/288202.

bouzas__dimitrios · ‎06-22-2017

Jing X. Thanx for the reply, but I've seen this thread already, actually I've spent several days thinking what I'm doing wrong with respect with the guidelines given there. Applying however the suggested solution, didn't manage to get the correct results. For example in the following demo, I've created a function called convertFromCCEFormat() which takes as input the output array in CCE format in C of the DftiComputeForward function and based on the best reply provided by Dmitry Baksheev I'm trying to "reconstruct" the full matrix. To put it more simply, feeding in DftiComputeForward the 2D 3x3 real valued matrix:

0.1, 0.2, 0.3
0.4, 0.5, 0.6
0.7, 0.8, 0.9

I get as result 5 element (i.e., 3x3 / 2 + 1) the complex valued vector:

4.5, 0.0

-0.45, 0.259808

-1.35, 0.779423

0.0, 0.0

-1.35, -0.779423

The correct result of the transformation would be the complex valued 3x3 matrix:

4.5 + 0j, -0.45 + 0.259808j, -0.45 - 0.259808j
-1.35 + 0.779423j, 0 - 0j, 0 - 0j
-1.35 - 0.779423j, 0 + 0j, 0 + 0j

My question is how would I transform the out of DftiComputeForward which is in CCE format in C to the matrix above?

To my understanding the thread suggests that the CCE format stores only the one half of the transformation. Due to the fact that fft matrix is conjugate symmetric you could reconstruct the original transformation matrix based on this symmetry. While the row subscript in the output runs normally in [0 - M) (where M is the number of rows of the initial signal 2D matrix) the columns subscript runs in [0 - N/2] (where N is the number of columns of the initial signal 2D matrix). Thus, for column subscripts greater than N / 2 we have to take the conjugate mirror element (i.e., y(i, N - j), where i and j are arbitrary row and column subscripts respectively). That's what I'm doing in my convertFromCCEFormat() function (at least that's what I think I'm doing) but it doesn't seem to give me correct results.

[DEMO]

#include "mkl_service.h"
#include "mkl_dfti.h"
#include <cstdio>

template<typename T>
T&
rowMajorAccess(T *a, int m, int n, int i, int j) {
  return a[i * n + j];
}

void
print(double const &val) {
  printf("%4.4lf ", val);
}

void
print(MKL_Complex16 const &val) {
  printf("(%4.4lf, %4.4lf) ", val.real, val.imag);
}

template<typename T>
void
printMatrix(T *a, int m, int n) {
  for (int i(0); i < m; ++i) {
    for (int j(0); j < n; ++j) {
      print(a[i * n + j]);
    }
    printf("\n");
  }
}

template<typename T>
T*
initMatrix(int m, int n) {
  T *out = (T*)mkl_malloc(m * n * sizeof(T), 128);
  for (int i(0), n_elem(m * n); i < n_elem; ++i) out = {};
  return out;
}

MKL_Complex16
conj(MKL_Complex16 const &in) {
  MKL_Complex16 out;
  out.real = in.real;
  out.imag = -in.imag;
  return out;
}

bool fft2(double *in, int m, int n, MKL_Complex16 *out) {
  bool ret(false);
  DFTI_DESCRIPTOR_HANDLE hand(NULL);
  MKL_LONG dim[2] = { m, n };
  if (!DftiCreateDescriptor(&hand, DFTI_DOUBLE, DFTI_REAL, 2, dim)) {
    if (!DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE)) {
      if (!DftiSetValue(hand, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX)) {
        MKL_LONG rs[3] = { 0, n, 1 };
        if (!DftiSetValue(hand, DFTI_INPUT_STRIDES, rs)) {
          MKL_LONG cs[3] = { 0, n / 2 + 1, 1 };
          if (!DftiSetValue(hand, DFTI_OUTPUT_STRIDES, cs)) {
            if (!DftiCommitDescriptor(hand)) {
              ret = !DftiComputeForward(hand, in, out);
            }
          }
        }
      }
    }
  }
  DftiFreeDescriptor(&hand);
  return ret;
}

MKL_Complex16*
convertFromCCEFormat(MKL_Complex16 *cce, int m, int n) {
  MKL_Complex16 *out = initMatrix<MKL_Complex16>(m, n);
  int half = n / 2 + 1;
  for (int i(0); i < m; ++i) {
    for (int j(0); j < half; ++j) {
      rowMajorAccess(out, m, n, i, j) = rowMajorAccess(cce, m, n, i, j);
    }
    for (int j(half); j < n; ++j) {
      rowMajorAccess(out, m, n, i, j) = conj(rowMajorAccess(cce, m, n, i, n - j));
    }
  }
  return out;
}

int main() {
  int const m = 3;
  int const n = 3;
  double *signal = initMatrix<double>(m, n);
  signal[0] = 0.1; signal[1] = 0.2; signal[2] = 0.3;
  signal[3] = 0.4; signal[4] = 0.5; signal[5] = 0.6;
  signal[6] = 0.7; signal[7] = 0.8; signal[8] = 0.9;
  printf("Initial Signal:\n");
  printMatrix(signal, m, n);

  MKL_Complex16 *cce = initMatrix<MKL_Complex16>(m, n / 2 + 1);
  fft2(signal, m, n, cce);

  printf("CCE Format for C\n");
  for (int i(0), n_elem(m * (n / 2 + 1)); i < n_elem; ++i) {
    printf("  (%g, %g)\n", cce.real, cce.imag);
  }

  MKL_Complex16 *freq = convertFromCCEFormat(cce, m, n);
  
  printf("Frequency Domain:\n");
  printMatrix(freq, m, n);

  mkl_free(freq);
  mkl_free(cce);
  mkl_free(signal);
}

Jing_Xu · ‎06-22-2017

Because your input is 3x3 matrix, the CCE format output has 6 values, to be:

4.5, 0.0; -0.45, 0.259808
-1.35, 0.779423; 0.0, 0.0
-1.35, -0.779423; 0.0, 0.0

Thus, rewrite them into complex format will be:

V11 V12

V21 V22

V31 V32

Since input is a 3x3 matrix, the result output should also be 3x3. Thus, the result matrix above should be in the format of:

V11 V12 ??

V21 V22 ??

V31 V32 ??

Restore it from CCE format would be:

V11 V12 conj(V12)

V21 V22 conj(V32)

V31 V32 conj(V22)

bouzas__dimitrios · ‎06-23-2017

Jing X. (Intel) thanx for the reply, that really cleared things up. For future readers having the same problem, I'm posting a CCE format in C to full matrix transformation:

//! \brief
//! - Decompresses an M x N matrix stored in complex conjugate even sequence (alas CCE)
//!   format in C.
//!
//! \param[in] cce   : Pointer to M x N matrix stored in CCE format in C (i.e, its size is M x (N / 2 + 1)).
//! \param[in] m     : Number of rows in output matrix.
//! \param[in] n     : Number of columns in output matrix.
//! \param[in] align : Memory alignment of output matrix.
//!
//! \return M x N uncompressed matrix.
//! \brief
//! - Decompresses an M x N matrix stored in complex conjugate even sequence (alas CCE)
//!   format in C.
//!
//! \param[in] cce   : Pointer to M x N matrix stored in CCE format in C (i.e, its size is M x (N / 2 + 1)).
//! \param[in] m     : Number of rows in resulting matrix.
//! \param[in] n     : Number of columns in resulting matrix.
//! \param[in] align : Memory alignment of output matrix.
//!
//! \return M x N uncompressed matrix.
template<typename T>
T*
convertFromCCEFormat(T const *cce, unsigned int const m, unsigned int const n, unsigned int const align = 64) {
  
  T *out = (T*) mkl_malloc(m * n * sizeof(T), align);
  
  unsigned int half = n / 2 + 1;
  
  for(unsigned int i(0); i < m; ++i) {

    for(unsigned int j(0); j < half; ++j) {

      out[i * n + j] = cce[i * half + j];
    
    }
    
    for(unsigned int j(half); j < n; ++j) {
      
      unsigned int idx_out = i * n + j;
      unsigned int idx_cce = i * half + n - j;

      // conjugate
      out[idx_out].real =  cce[idx_cce].real;
      out[idx_out].imag = -cce[idx_cce].imag;
    
    }

  }

  return out;
}