<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi MooN, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950828#M15201</link>
    <description>&lt;P&gt;Hi MooN,&lt;/P&gt;
&lt;P&gt;Could you please attach the code into a c file. include the init() code?&lt;/P&gt;
&lt;P&gt;I 'm not sure which example you are seeing.&amp;nbsp; it seems there are some errors, like your data type are MKL_Complex16, but in DftiCreateDescriptor(&amp;nbsp;&amp;nbsp; DFTI_SINGLE&amp;nbsp; -&amp;gt; DFTI_DOUBLE)?&lt;/P&gt;
&lt;P&gt;Here is another similiar discussion about the output&amp;nbsp;&lt;A href="http://software.intel.com/en-us/forums/topic/402439"&gt;http://software.intel.com/en-us/forums/topic/402439&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;But another issue is that, 2048 FFT is not equal to 4 x 512 FFT from mathmatics views.&amp;nbsp;so the try may not work.&amp;nbsp;&amp;nbsp;&amp;nbsp;and &amp;nbsp;2048 ctoc &amp;nbsp;FFT is threaded internally, the parallel work&amp;nbsp;may not needed either.&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Ying&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 09 Oct 2013 08:30:00 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2013-10-09T08:30:00Z</dc:date>
    <item>
      <title>parallelization case N°4</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950825#M15198</link>
      <description>&lt;P&gt;Hello MKL professionals&lt;/P&gt;
&lt;P&gt;i m working on parallelising the FFTMKL 2048 1D on 4 threads, so that each thread do a 512 FFT, the good thing is it works in the parallel region yet the dfticomputeforward generates wrong results. how to make the recombination of output data of the 4 x 512 independantly?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;#pragma omp parallel num_threads(nThread)&lt;/STRONG&gt;&lt;BR /&gt;{&lt;BR /&gt;MKL_LONG status;&lt;BR /&gt;int myID = omp_get_thread_num ();&lt;BR /&gt;printf("Thread's ID %d\n=", myID); &lt;BR /&gt;&lt;STRONG&gt;status = DftiComputeForward( my_desc1_handle, &amp;amp;array11[myID*len]);&lt;/STRONG&gt; // but the results of the array11 are false and sometimes i get zeros&amp;amp; //loss of FFT synchronisation&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;//Memory check, output values are not as expected&lt;BR /&gt;&amp;nbsp;status1 = DftiFreeDescriptor(&amp;amp;my_desc1_handle);&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;}&lt;/P&gt;
&lt;P&gt;Thank you for answering&lt;/P&gt;</description>
      <pubDate>Tue, 08 Oct 2013 22:34:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950825#M15198</guid>
      <dc:creator>MooN_K_</dc:creator>
      <dc:date>2013-10-08T22:34:56Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950826#M15199</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;First thought, the problem may be my_desc1_handle.&amp;nbsp;&amp;nbsp;&amp;nbsp;It&amp;nbsp;shouldn't be a&amp;nbsp;shared variable&amp;nbsp; for the 4 external openMP threads.&lt;/P&gt;
&lt;P&gt;On the other hand, what is your data type,you may have known that most of FFT function in MKL is threaded (please check mkl user manual). so it may be no necessary for you to parallel them yourself.&lt;/P&gt;
&lt;P&gt;for example, In particular, computation of multiple transforms in one call (number of transforms &amp;gt; 1) is threaded ( you can do 4 x512&lt;A href="#"&gt;&lt;IMG src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAIGNIUk0AAHolAACAgwAA+f8AAIDpAAB1MAAA6mAAADqYAAAXb5JfxUYAAAKLSURBVHjadJPfS5NhFMe/21xvuhXRyJAZroiSrJnbRdT7vrAf5HBaK5RABmEEwQIvkpZ/QRcWXdSFw5soKaF0F7qZeLO13mGBDpQsf5CoxVKHOt0Pctp2uvEdrzG/V+c553w/54HnPDIiQiGpPMETABoB2AAYd9MRAMMAvGmX+RcAyAoBVJ7gZQDtABworH4AHWmX+bOMZdkjCoXiUzabvcAwzPSsob5p/VTNY9GcdpnxdmYZ9wJThSCtCr1e/4XjuNPd3d1KjUZzaGbI27ysqzGQoggAsLa1A7ehArrDxfDNr0oBlQB+wmKxbJFEL968SxoamsjkHaPU9l9piUo6A0RE1DG2QCWdASrpDAzJM5kMI8XecdjVxfEl+K9dxFgsgUvvR6HyBKHyBAEATyKLeGSsENuNcqk5kUjEGm7fzcYqr0ClVODl99+YXEvl6+c1amjVe+ahiGGYaUEQKnmeh91uL43rqheixjpdmzCL11er0PcjhrTLvMfUJsyKYUSeyWQ6enp6tgCgrKxsfbP8bB8AdE1G89cOReMAgOv+Cag8QXRNRkXAsDwcDr+am5tLCYKA3t7eo2dG+1vVK/MfpRPtA+MIReMYaKj+/xm9MiICx3EmpVL5wefzFavValis1u1vvHMkdfykCQC0kSGUTo+Ajmnx1dSC7IGD+UUCEYGIwLKsyWazrSeTSSIiMpnNf7Ttz5+ec96fr7/VnE0mk+QfHMzV3WjcKH/4rEr05QGFIA6HY4llWRLPRER+v3/HYrFMFQSIkNra2tVQKJSlfcSyLO0LECFWq3XF6XRGA4HAptTsdrsXeZ6fEHtl+31nAOA4rkUulz/I5XL63dQGgHEAN8Ph8AYA/BsAt4ube4GblQIAAAAASUVORK5CYII=" /&gt;&lt;/A&gt; FFT in one call).&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Ying&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2013 03:04:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950826#M15199</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2013-10-09T03:04:27Z</dc:date>
    </item>
    <item>
      <title>Hello Ying</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950827#M15200</link>
      <description>&lt;P&gt;Hello Ying&lt;/P&gt;
&lt;P&gt;I am trying to run the example 4 of the different parallelization techniques and the "&lt;STRONG&gt;status = DftiComputeForward( my_desc1_handle, &amp;amp;array11[myID*len])&lt;/STRONG&gt;" is in the shared memory, but the result remains wrong.&lt;/P&gt;
&lt;P&gt;the input data is a sinusoidal signal of N=2048, [0.0004882813, -0.0004882813]&lt;/P&gt;
&lt;P&gt;So the Fourrier Transform of this sinusoid above is a dirac pulse located in the N/2 But, all what i get are zeros -0.0000000000 or the same input signal.&lt;/P&gt;
&lt;P&gt;here s the code:&lt;/P&gt;
&lt;P&gt;#include "mkl_dfti.h"&lt;/P&gt;
&lt;P&gt;#include "omp.h"&lt;/P&gt;
&lt;P&gt;void main (){&lt;/P&gt;
&lt;P&gt;MKL_Complex16 x[2048];&lt;/P&gt;
&lt;P&gt;MKL_LONG status;&lt;/P&gt;
&lt;P&gt;DFTI_DESCRIPTOR_HANDLE desc_handle;&lt;/P&gt;
&lt;P&gt;int nThread = omp_get_max_threads ();&lt;/P&gt;
&lt;P&gt;MKL_LONG len=512;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;init(x,2048,1028);&lt;/STRONG&gt;//sinusoid input init&lt;/P&gt;
&lt;P&gt;status = DftiCreateDescriptor (&amp;amp;desc_handle, DFTI_SINGLE, DFTI_COMPLEX, 1, len);&lt;/P&gt;
&lt;P&gt;status = DftiSetValue (desc_handle, DFTI_NUMBER_OF_USER_THREADS, nThread);&lt;/P&gt;
&lt;P&gt;status = DftiCommitDescriptor (desc_handle);&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;// each thread calculates an FFT of 512&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;#pragma omp parallel num_threads(nThread){&lt;/P&gt;
&lt;P&gt;MKL_LONG myStatus;&lt;/P&gt;
&lt;P&gt;int myID = omp_get_thread_num ();&lt;/P&gt;
&lt;P&gt;myStatus = DftiComputeForward (desc_handle,&amp;nbsp; &amp;amp;x [&lt;STRONG&gt;myID * len&lt;/STRONG&gt;] );//myID is a number from 0 to 3 related to the thread ID&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;// x output is the same as the input (No conversion) and no dirac pulse in the N/2&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;} &lt;/P&gt;
&lt;P&gt;status = DftiFreeDescriptor (&amp;amp;desc_handle);&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;According to this example provided by Intel, i tested it with a sinusoid in the input and i need to verify the Dirac pulse in the N/2 point of the output.&lt;/P&gt;
&lt;P&gt;Thanks for the help&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2013 05:58:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950827#M15200</guid>
      <dc:creator>MooN_K_</dc:creator>
      <dc:date>2013-10-09T05:58:25Z</dc:date>
    </item>
    <item>
      <title>Hi MooN,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950828#M15201</link>
      <description>&lt;P&gt;Hi MooN,&lt;/P&gt;
&lt;P&gt;Could you please attach the code into a c file. include the init() code?&lt;/P&gt;
&lt;P&gt;I 'm not sure which example you are seeing.&amp;nbsp; it seems there are some errors, like your data type are MKL_Complex16, but in DftiCreateDescriptor(&amp;nbsp;&amp;nbsp; DFTI_SINGLE&amp;nbsp; -&amp;gt; DFTI_DOUBLE)?&lt;/P&gt;
&lt;P&gt;Here is another similiar discussion about the output&amp;nbsp;&lt;A href="http://software.intel.com/en-us/forums/topic/402439"&gt;http://software.intel.com/en-us/forums/topic/402439&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;But another issue is that, 2048 FFT is not equal to 4 x 512 FFT from mathmatics views.&amp;nbsp;so the try may not work.&amp;nbsp;&amp;nbsp;&amp;nbsp;and &amp;nbsp;2048 ctoc &amp;nbsp;FFT is threaded internally, the parallel work&amp;nbsp;may not needed either.&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Ying&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2013 08:30:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950828#M15201</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2013-10-09T08:30:00Z</dc:date>
    </item>
    <item>
      <title>Hi MooN,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950829#M15202</link>
      <description>&lt;P&gt;Hi MooN,&lt;/P&gt;
&lt;P&gt;The OMP code is fine in your code.&amp;nbsp; The&amp;nbsp;main problem looks that you do commit before the set FFT discriptor.&amp;nbsp;&amp;nbsp; After move the code, your program will work.&amp;nbsp; I also fix some tiny problems (attached the fixed code,&amp;nbsp;there 4 512 FFT, then the dirace pulse are in each len/2 =&amp;nbsp;256 place. you can see the .25&amp;nbsp;on&amp;nbsp;point 256, &amp;nbsp;768,1280,1792).&lt;/P&gt;
&lt;P&gt;Regarding the MKL&amp;nbsp;FFT support threaded internally and aslo support user defined thread, as you see, there&amp;nbsp;are&amp;nbsp;variable&amp;nbsp;requests for&amp;nbsp;developers who need to parallelized their application. For example, if you have&amp;nbsp;a bunch of arrays,&amp;nbsp;each array&amp;nbsp;need to do&amp;nbsp;one FFT. considering&amp;nbsp;the&amp;nbsp;mult-core reasouce, you may hope do these FFT simutaniously, i.e 4 FFT one times.&amp;nbsp;So the 4 techniques are for that.&lt;/P&gt;
&lt;P&gt;I need to correct one of my&amp;nbsp;comments.&amp;nbsp; c2c 2048&amp;nbsp;is&amp;nbsp;only threaded internally&amp;nbsp;under some condition&amp;nbsp;like 64bit, not 32bit (please see some related of mkl userguide). So&amp;nbsp;if you have&amp;nbsp;bunch of array which length 2048&amp;nbsp;in 32bit application, then&amp;nbsp;you have good&amp;nbsp;reason to&amp;nbsp;use custom threads.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Ying&lt;/P&gt;
&lt;P&gt;init(array11,2048,1024);&lt;BR /&gt;status1 = DftiCreateDescriptor( &amp;amp;my_desc1_handle, DFTI_DOUBLE, DFTI_COMPLEX, 1, len);&lt;BR /&gt;int nThread = omp_get_max_threads ();&lt;/P&gt;
&lt;P&gt;status1 = DftiSetValue(my_desc1_handle , DFTI_PLACEMENT, DFTI_NOT_INPLACE);&lt;BR /&gt;status1 = DftiSetValue (my_desc1_handle, DFTI_NUMBER_OF_USER_THREADS, nThread);&lt;/P&gt;
&lt;P&gt;status1 = DftiCommitDescriptor( my_desc1_handle );&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Intel® Math Kernel Library 11.1 User's Guide&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Threaded Functions and Problems&lt;/P&gt;
&lt;P&gt;The following Intel MKL function domains are threaded:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;P&gt;Direct sparse solver.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;LAPACK.&lt;/P&gt;
&lt;P&gt;For the list of threaded routines, see &lt;A href="http://software.intel.com/GUID-2E857767-CF73-4460-A7EA-5A85D21E9431.htm#LAPACK"&gt;Threaded LAPACK Routines&lt;/A&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Level1 and Level2 BLAS.&lt;/P&gt;
&lt;P&gt;For the list of threaded routines, see &lt;A href="http://software.intel.com/GUID-2E857767-CF73-4460-A7EA-5A85D21E9431.htm#BLAS"&gt;Threaded BLAS Level1 and Level2 Routines&lt;/A&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;All Level 3 BLAS and all Sparse BLAS routines except Level 2 Sparse Triangular solvers.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;All mathematical VML functions.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;FFT.&lt;/P&gt;
&lt;P&gt;For the list of FFT transforms that can be threaded, see &lt;A href="http://software.intel.com/GUID-2E857767-CF73-4460-A7EA-5A85D21E9431.htm#FFT"&gt;Threaded FFT Problems&lt;/A&gt;.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;One-dimensional (1D) transforms&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;1D transforms are threaded in many cases.&lt;/P&gt;
&lt;P&gt;1D complex-to-complex (c2c) transforms of sizeNusing interleaved complex data layout are threaded under the following conditions depending on the architecture:&lt;/P&gt;
&lt;P&gt;Architecture&lt;/P&gt;
&lt;P&gt;Conditions&lt;/P&gt;
&lt;P&gt;Intel® 64&lt;/P&gt;
&lt;P&gt;Nis a power of 2, log&lt;SUB&gt;2&lt;/SUB&gt;(N) &amp;gt; 9, the transform is double-precision out-of-place, and input/output strides equal 1.&lt;/P&gt;
&lt;P&gt;IA-32&lt;/P&gt;
&lt;P&gt;Nis a power of 2, log&lt;SUB&gt;2&lt;/SUB&gt;(N) &amp;gt; 13, and the transform is single-precision.&lt;/P&gt;
&lt;P&gt;Nis a power of 2, log&lt;SUB&gt;2&lt;/SUB&gt;(N) &amp;gt; 14, and the transform is double-precision.&lt;/P&gt;
&lt;P&gt;Any&lt;/P&gt;
&lt;P&gt;Nis composite, log&lt;SUB&gt;2&lt;/SUB&gt;(N) &amp;gt; 16, and input/output strides equal 1.&lt;/P&gt;
&lt;P&gt;1D complex-to-complex transforms using split-complex layout are not threaded.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2013 02:05:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950829#M15202</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2013-10-10T02:05:25Z</dc:date>
    </item>
    <item>
      <title>Thank you for your help Ying</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950830#M15203</link>
      <description>&lt;P&gt;Thank you for your help Ying it works,&lt;/P&gt;
&lt;P&gt;I m still having troubles with working in the parallel region with openmp, and i get a random thread's ID (not in order) example for a number of threads =4, so i get Thread's ID =1,Thread's ID =3,Thread's ID =2,Thread's ID =0 consecutively, and for another execution i get another different order.How to get the right order of IDs eq to 0, 1, 2, 3 ?&lt;/P&gt;
&lt;P&gt;Here is the code:&lt;/P&gt;
&lt;P&gt;int nThread = omp_get_max_threads ();&lt;/P&gt;
&lt;P&gt;#pragma omp parallel num_threads(nThread)&lt;BR /&gt;{&lt;BR /&gt;int myID=omp_get_thread_num ();&lt;/P&gt;
&lt;P&gt;printf("Thread's ID %d \n", myID);&lt;/P&gt;
&lt;P&gt;}&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2013 05:20:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950830#M15203</guid>
      <dc:creator>MooN_K_</dc:creator>
      <dc:date>2013-10-10T05:20:07Z</dc:date>
    </item>
    <item>
      <title>HaHa, it is the exact </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950831#M15204</link>
      <description>&lt;P&gt;HaHa, it is the exact&amp;nbsp;"trouble" in &amp;nbsp;&lt;A href="http://software.intel.com/en-us/forums/topic/475357"&gt;http://software.intel.com/en-us/forums/topic/475357&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;But&amp;nbsp;in most of case, &amp;nbsp;the out of order of thread executation&amp;nbsp;should be the&amp;nbsp;nature feature or "advantage"&amp;nbsp;of&amp;nbsp;the&amp;nbsp;multi-thread application. The executation&amp;nbsp;order of mult-threads&amp;nbsp;&amp;nbsp;should be scheduled by OS based on current system resource. &amp;nbsp;What we can do is to assign correct&amp;nbsp;task to each threads, for example ,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;int myID = omp_get_thread_num ();&lt;BR /&gt;status = DftiComputeForward( my_desc1_handle, &amp;amp;array11[myID*len],&amp;amp;array12[myID*len]);&lt;/P&gt;
&lt;P&gt;Thus whatever the order, your will get wanted the result in result array.&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Ying&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2013 08:05:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950831#M15204</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2013-10-10T08:05:05Z</dc:date>
    </item>
    <item>
      <title>Hello Ying</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950832#M15205</link>
      <description>&lt;P&gt;Hello Ying&lt;/P&gt;
&lt;P&gt;If the FFT MKL is already threaded, then why Intel proposed the 4 techniques to parallelize it? According to my project, the 4th case will work properly with my algorithm. but i need a result output verification of this method which is still missing. &lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2013 16:52:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950832#M15205</guid>
      <dc:creator>MooN_K_</dc:creator>
      <dc:date>2013-10-10T16:52:36Z</dc:date>
    </item>
    <item>
      <title>Hello ying, Thank you for</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950833#M15206</link>
      <description>&lt;P&gt;Hello ying, Thank you for your help&lt;/P&gt;

&lt;P&gt;After dividing the initial data to 4 and performing the 4 Parallel FFTs to them, is there a way to recombine these 4 FFTs together to have a result as if it was a simple FFT of the initial data directly? That whats missing before jumping into the implementation part.&lt;/P&gt;

&lt;P&gt;Any help would be appreciated :)&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P&gt;Moon&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2013 18:17:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950833#M15206</guid>
      <dc:creator>MooN_K_</dc:creator>
      <dc:date>2013-12-05T18:17:58Z</dc:date>
    </item>
    <item>
      <title>Hello MooN,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950834#M15207</link>
      <description>&lt;P&gt;Hello MooN,&lt;/P&gt;

&lt;P&gt;Do you&amp;nbsp; hope to do&amp;nbsp;a bundle of 2048 1D&amp;nbsp;FFT or only one 2048 1D FFT?&lt;/P&gt;

&lt;P&gt;As&amp;nbsp;we discussed last time, the parallel&amp;nbsp;case 4 mainly focus on do multiply 2048 1D FFT in parallel.&amp;nbsp; So if you&amp;nbsp;really need to do&amp;nbsp;one 2048 1D&amp;nbsp;FFT through 4 threads&amp;nbsp; ( though it may not&amp;nbsp; bring performance benefit , that is why&amp;nbsp;MKL haven't threaded it internally),&amp;nbsp;&amp;nbsp;as 2048 FFT is not equal to 4 x 512 FFT simply, you may need to&amp;nbsp;calculate the whole&amp;nbsp;processing&amp;nbsp;from mathematics views, for example butterfly algorithm, then employ corresponding MKL functions to complete it. (like reorganize the result, do FFT again).&amp;nbsp;MKL should not provide such function to recombine them directly.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Dec 2013 01:24:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/parallelization-case-N-4/m-p/950834#M15207</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2013-12-09T01:24:01Z</dc:date>
    </item>
  </channel>
</rss>

