Hi Evgueni,

Felix__K_ · ‎08-09-2015

Hello,

I need to compute the 1d convolution. Intel MKL offers two basic strategies to do this.

1) Explicit implementation of the convolution theorem by the user i.e. perform DFT's on the input data and on the kernel. Multiply the results in the Fourier domain element wise. Next perform an inverse DFT to get the desired result.

2) Compute the convolution directly by using VSL math function in FFT mode.

To my knowlegde the computation using a VSL function is not internally parallelized by OpenMP. Hence it does not benefit from multicore architectures right ? But the FFT functions within MKL are indeed internally parallelized by OpenMP and I can even call the cluster version for very huge problems. My question is know what's the most effective way to calculate the convolution the first or the second way ? The Fourier transform of the kernel might be known analytically (since the kernel is given by a known function). Is there a way to exploit this in the second approach ? Within the first approach calculation of the FFT of the kernel is avoided.

Best regards,

Felix

Ying_H_Intel · ‎08-09-2015

Hi Felix,

Could you please tell your problem model , problem size and data type.

As i understand, the method 2) is same as method 1) when MKL VSL_CONV_MODE_FFT or VSL_CONV_MODE_AUTO. MKL actually provide three model to do convolution:

1) VSL_CONV_MODE_FFT Compute convolution by using fast Fourier transform.
2) VSL_CONV_MODE_DIRECT Compute convolution directly.
3) VSL_CONV_MODE_AUTO Automatically choose direct or Fourier mode for convolution according to process, problem size and data type.

As you know the FFT functions within MKL are indeed internally parallelized by OpenMP, And VSL depend on the MKL FFT, So VSL are parallelized through MKL DFT parallelization ( here FFT almost = DFT).

like DFT transform are threaded in most of cases except the following:

Split-complex 1Ds are not threaded
Small multidimensional transforms are not threaded

because under such case, the parallel don't bring performance.

In your case, do you mean you have ready FFT result of kernel, so hope to write your self convolution, right?

Best Regards,

Ying

Felix__K_ · ‎08-10-2015

Hello Ying,

thank you so much for your quick answer ! The problem is effectively the convolution of 1-D discrete spectral data with size up to 12K (or even bigger) with a 1x1 broadening kernel of e.g. Lorentian or Gaussian shape. The data type on input would be Real (double precision) and the output should be Complex(double precision). The FFT's should be out-of-place transforms. Because the functional form of the broadening kernel is known, the Fourier transform can be performed analytically. So my question reduces to: Which option is the better one as far as performance is concerned.

1) Use 1-D convolution from VSL library in VSL_CON_MODE_FFT (which would include an FFT of the Kernel)

2) Write the 1-D convolution by myself by calling one forward FFT for the discrete spectral data, multiply the result element wise with the Fourier-transformed Kernel (for example by using OpenMP or Vectorizations (either by using Cilk++ array notation, or SIMD pragmas)) and then call IFFT on this to obtain the broadened spectral data.

Best regards,

Felix

Evgueni_P_Intel · ‎08-10-2015

Hi Felix,

If kernel size is 1, convolution can be replaced by scaling -- you can do it in a loop or call BLAS routines like DAXPY and DSCAL.

However, dimensions seem to mismatch in your post: "1x1 kernel" vs "1D spectral data". Could you please clarify this point?

Under #2, you describe cyclic convolution. To do non-cyclic convolution, you need to padd the input array with zeros.

Evgueni.

Felix__K_ · ‎08-11-2015

Hi Evgueni,

okay your right, maybe I told you something wrong. The kernel is not 1x1. To be precise I cite the exact formula that needs to be computed (Latex) :

\int_{-\infty}^{\infty} dt^{\prime} K(t - t^{\prime}) F(t^{\prime})

where K(t - t^{\prime}) is the kernel known analytically and F(t^{\prime}) is the raw discrete data of length D. The integral limits are actually finite since the integrand is only non-zero in a range of say t^{\prime} \in [-1,1] in which D data points are found.

As I see I haven't fully understood what the kernel should look like in this case to obtain the result I want. So I will read about it and come back to my questions as soon as possible. Thank you for your help so far !

Best regards,

Felix

1D Convolution