<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Parallelizing FFT not seeing 100% CPU in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911813#M12205</link>
    <description>&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=460003" class="basic" href="https://community.intel.com/en-us/profile/460003/"&gt;mimarsh2&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="border: 1px inset; padding: 5px; background-color: #e5e5e5; margin-left: 2px; margin-right: 2px;"&gt;OS: Windows 7 Pro 64 bit&lt;I&gt;
&lt;P&gt;&lt;B&gt;env: OMP_NUM_THREADS = 2 (this is set in User variables. does it need to be in system variables?)&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;MKL version : 9.1.026&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;/I&gt;yes, only in this case you will have the performance improvements on 2 threads,&lt;BR /&gt;of whom Dima mentioned above&lt;/DIV&gt;
&lt;DIV style="border: 1px inset; padding: 5px; background-color: #e5e5e5; margin-left: 2px; margin-right: 2px;"&gt;--Gennady&lt;/DIV&gt;</description>
    <pubDate>Mon, 18 Jan 2010 06:59:20 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2010-01-18T06:59:20Z</dc:date>
    <item>
      <title>Parallelizing FFT not seeing 100% CPU</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911811#M12203</link>
      <description>&lt;P&gt;I started a new thread because it seemed I could not continue discussion on &lt;A href="http://software.intel.com/en-us/forums/showthread.php?t=71035" target="_blank"&gt;http://software.intel.com/en-us/forums/showthread.php?t=71035&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;problem: when I use the MKL FFT I only see one processor in use&lt;/P&gt;
&lt;P&gt;Processor: Intel Core 2 Duo CPU E8400 @ 3.00GHz 2.99 GHz&lt;/P&gt;
&lt;P&gt;RAM: 4.0 GB&lt;/P&gt;
&lt;P&gt;OS: Windows 7 Pro 64 bit&lt;/P&gt;
&lt;P&gt;env: OMP_NUM_THREADS = 2 (this is set in User variables. does it need to be in system variables?)&lt;/P&gt;
&lt;P&gt;MKL version : 9.1.026&lt;/P&gt;
&lt;P&gt;linking against em64t/lib/mkl_3m64t.lib&lt;/P&gt;
&lt;P&gt;I am creating an x64 executable using Visual Studio 2005&lt;/P&gt;
&lt;P&gt;I am aligning arrays to 128 byte divisible&lt;/P&gt;
&lt;P&gt;Any help is appreciated. Thanks in advance&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;numElements is 1 &amp;lt;&amp;lt;23&lt;/P&gt;
&lt;P&gt;code :&lt;BR /&gt; &lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;[bash]static INT32 IntelDoubleFFT(INT8     transformType,  //type of transform (1: normal or -1: inverse)&lt;BR /&gt;                            double * realDataArray,  //data array (input and output)&lt;BR /&gt;                            double * imagDataArray,  //imaginary data array&lt;BR /&gt;                            UINT32   numElements)    //size of each data array&lt;BR /&gt;{&lt;BR /&gt;    UINT32 i;&lt;BR /&gt;    _MKL_Complex16 *compDataArray;&lt;BR /&gt;    _MKL_Complex16 *out;&lt;BR /&gt;&lt;BR /&gt;    DFTI_DESCRIPTOR_HANDLE complexDescriptor;&lt;BR /&gt;    long status = DFTI_NO_ERROR;&lt;BR /&gt;&lt;BR /&gt;    start_o = clock();&lt;BR /&gt;	&lt;BR /&gt;    compDataArray = (_MKL_Complex16*)calloc(numElements, sizeof(*compDataArray) + 256);&lt;BR /&gt;    if (compDataArray == NULL) {&lt;BR /&gt;        return -1;&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    out = (_MKL_Complex16*)calloc(numElements, sizeof(*compDataArray) + 256);&lt;BR /&gt;    if (out == NULL) {&lt;BR /&gt;        return -1;&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;	UINT64 temp;&lt;BR /&gt;	char *align;&lt;BR /&gt;	align = (char*)out;&lt;BR /&gt;	temp = (UINT64)align;&lt;BR /&gt;	align = align + (temp % 128);&lt;BR /&gt;	out = (_MKL_Complex16*)align;&lt;BR /&gt;&lt;BR /&gt;	align = (char*)compDataArray;&lt;BR /&gt;	temp = (UINT64)align;&lt;BR /&gt;	align = align + (temp % 128);&lt;BR /&gt;	compDataArray = (_MKL_Complex16*)align;&lt;BR /&gt;	&lt;BR /&gt;	printf("aligned to %p and %p\n", compDataArray, out);&lt;BR /&gt;&lt;BR /&gt;    //combine real and imag arrays into single complex array for DFT call&lt;BR /&gt;    for (i = 0; i &amp;lt; numElements; i++) {&lt;BR /&gt;        compDataArray&lt;I&gt;.real = realDataArray&lt;I&gt;;&lt;BR /&gt;        compDataArray&lt;I&gt;.imag = -1.0 * imagDataArray&lt;I&gt;;&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    finish_o = clock();&lt;BR /&gt;	overhead += (double)(finish_o - start_o) / CLOCKS_PER_SEC;&lt;BR /&gt;&lt;BR /&gt;    //set up descriptor handle - handle, precision, forward_domain, dimension, length&lt;BR /&gt;    status = DftiCreateDescriptor(&amp;amp;complexDescriptor, DFTI_DOUBLE, DFTI_COMPLEX, 1, numElements);&lt;BR /&gt;&lt;BR /&gt;    if (status == DFTI_NO_ERROR) {&lt;BR /&gt;        //set the scale factor for the backward transform to be 1/n to make it the inverse of the forward transform&lt;BR /&gt;        status = DftiSetValue(complexDescriptor, DFTI_BACKWARD_SCALE, (double) 1 / numElements);&lt;BR /&gt;        DftiSetValue(complexDescriptor, DFTI_PLACEMENT, DFTI_NOT_INPLACE);&lt;BR /&gt;&lt;BR /&gt;        if (status == DFTI_NO_ERROR) {&lt;BR /&gt;            //commit descriptor for initial calculations&lt;BR /&gt;            status = DftiCommitDescriptor(complexDescriptor);&lt;BR /&gt;&lt;BR /&gt;            if (status == DFTI_NO_ERROR) {&lt;BR /&gt;                //compute DFT&lt;BR /&gt;                if (transformType == 1) { //forward (normal) DFT&lt;BR /&gt;                    status = DftiComputeForward(complexDescriptor, compDataArray, out);&lt;BR /&gt;                } else { //backward (inverse) DFT&lt;BR /&gt;                    status = DftiComputeBackward(complexDescriptor, compDataArray, out);&lt;BR /&gt;                }&lt;BR /&gt;            }&lt;BR /&gt;        }&lt;BR /&gt;        DftiFreeDescriptor(&amp;amp;complexDescriptor);          //free memory&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    start_o = clock();&lt;BR /&gt;&lt;BR /&gt;    //split complex array for output&lt;BR /&gt;    for (i = 0; i &amp;lt; numElements; i++) {&lt;BR /&gt;        realDataArray&lt;I&gt; = out&lt;I&gt;.real;&lt;BR /&gt;        imagDataArray&lt;I&gt; = -1.0 * out&lt;I&gt;.imag;&lt;BR /&gt;    }&lt;BR /&gt;&lt;BR /&gt;    //free(compDataArray);&lt;BR /&gt;    //free(out);&lt;BR /&gt;&lt;BR /&gt;    finish_o = clock();&lt;BR /&gt;	overhead += (double)(finish_o - start_o) / CLOCKS_PER_SEC;&lt;BR /&gt;&lt;BR /&gt;    if (status == DFTI_NO_ERROR) {&lt;BR /&gt;        return 0;&lt;BR /&gt;    } else {&lt;BR /&gt;        return -1;&lt;BR /&gt;    }&lt;BR /&gt;}[/bash]&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Jan 2010 19:59:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911811#M12203</guid>
      <dc:creator>Marshall__Michael_B</dc:creator>
      <dc:date>2010-01-15T19:59:10Z</dc:date>
    </item>
    <item>
      <title>Parallelizing FFT not seeing 100% CPU</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911812#M12204</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Youmayset envvar &lt;A href="http://www.intel.com/software/products/compilers/docs/fmac/doc_files/source/extfile/optaps_for/common/optaps_openmp_thread_affinity.htm#KMP_AFFINITY_Environment_Variable"&gt;KMP_AFFINITY&lt;/A&gt;=verbose,compact to see how many threadsFFT starts internally. In your case the transform should be done in parallel, with ~40% improvement on 2 threads.&lt;/P&gt;
&lt;P&gt;Thanks&lt;BR /&gt;Dima&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jan 2010 05:04:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911812#M12204</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2010-01-18T05:04:10Z</dc:date>
    </item>
    <item>
      <title>Parallelizing FFT not seeing 100% CPU</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911813#M12205</link>
      <description>&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A rel="/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=460003" class="basic" href="https://community.intel.com/en-us/profile/460003/"&gt;mimarsh2&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="border: 1px inset; padding: 5px; background-color: #e5e5e5; margin-left: 2px; margin-right: 2px;"&gt;OS: Windows 7 Pro 64 bit&lt;I&gt;
&lt;P&gt;&lt;B&gt;env: OMP_NUM_THREADS = 2 (this is set in User variables. does it need to be in system variables?)&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;MKL version : 9.1.026&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;/I&gt;yes, only in this case you will have the performance improvements on 2 threads,&lt;BR /&gt;of whom Dima mentioned above&lt;/DIV&gt;
&lt;DIV style="border: 1px inset; padding: 5px; background-color: #e5e5e5; margin-left: 2px; margin-right: 2px;"&gt;--Gennady&lt;/DIV&gt;</description>
      <pubDate>Mon, 18 Jan 2010 06:59:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallelizing-FFT-not-seeing-100-CPU/m-p/911813#M12205</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-01-18T06:59:20Z</dc:date>
    </item>
  </channel>
</rss>

