<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do you spread FFTs across CPUs? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864147#M7731</link>
    <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;I don't quite understand your answer. In the simple case of plain C code without threads calling 1 FFT routine, how can I get this MKL FFT library routine to spread across CPUs? Do I put the #pragma in the code to trigger MKL's internal threads? Do I put the "export GOMP_CPU_AFFINITY=5" in the code to trigger the MKL internal FFT threads to spawn? If the simple case can work, then I'm all set.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Bonnie&lt;BR /&gt;&lt;/DIV&gt;</description>
    <pubDate>Thu, 09 Apr 2009 00:53:56 GMT</pubDate>
    <dc:creator>bonniegb</dc:creator>
    <dc:date>2009-04-09T00:53:56Z</dc:date>
    <item>
      <title>How do you spread FFTs across CPUs?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864145#M7729</link>
      <description>&lt;P&gt;I have been unsuccessful getting the MKL FFT routines to spread across CPUs. I have tried not-in-place, single and double precision FFTs, and set every environment variable I can find (like OMP_NUM_THREADS, MKL_NUM_THREAD, MKL_DOMAIN_NUM_THREADS with the FFT portion set to 4, all set to 16). I even tried your example code shipped with MKL. Even though the software queried and found 16 CPUs based on the environment variables, it used only 1 of them. I am using MKL 10.0.1.014. What am I missing?&lt;BR /&gt;&lt;BR /&gt;I successfully inserted my own threads using PTHREADS and called 3 simulataneous FFTs from with the 15 threads, but I am being throttled by the 3 FFTs and I need them to split across CPUs too (it limits it to 3x faster vs. 10x faster which is my goal with fully functioning FFTs).&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Bonnie&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2009 01:02:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864145#M7729</guid>
      <dc:creator>bonniegb</dc:creator>
      <dc:date>2009-04-08T01:02:42Z</dc:date>
    </item>
    <item>
      <title>Re: How do you spread FFTs across CPUs?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864146#M7730</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
There must be many ways to go about this. I guess you are making a separate MKL call for each FFT, so you want each to be assigned to a different core (not simply to a different logical processor, if you are running a HyperThread platform). If, for example, you used OpenMP:&lt;BR /&gt;&lt;BR /&gt;#pragma parallel for&lt;BR /&gt;for(idfft=0; idfft &amp;lt; 3; ++idfft)&lt;BR /&gt; somekindof mklfft(data_set(idfft))&lt;BR /&gt;&lt;BR /&gt;with HyperThreading on Xeon 5500, you would need something like&lt;BR /&gt;export GOMP_CPU_AFFINITY=1,3,9,11,5,7,13,15,.....&lt;BR /&gt;so as to assign 8 threads to different cores and try to spread across CPUs. &lt;BR /&gt;With HT disabled, you would not be so dependent on affinity control.&lt;BR /&gt;You could do something with pthreads and their affinity calls, if it is safe to assume a machine dedicated to your job.&lt;BR /&gt;&lt;BR /&gt;Intel compilers have implemented the OpenMP 3.0 tasking, in case your prefer that, but the older workshare is not yet multi-threaded.&lt;BR /&gt;&lt;BR /&gt;If the parallelization is done by MKL, that also observes the KMP_AFFINITY or GOMP_CPU_AFFINITY.&lt;BR /&gt;I may have totally mis-guessed your intentions.&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Apr 2009 01:43:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864146#M7730</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-04-08T01:43:30Z</dc:date>
    </item>
    <item>
      <title>Re: How do you spread FFTs across CPUs?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864147#M7731</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV style="margin:0px;"&gt;I don't quite understand your answer. In the simple case of plain C code without threads calling 1 FFT routine, how can I get this MKL FFT library routine to spread across CPUs? Do I put the #pragma in the code to trigger MKL's internal threads? Do I put the "export GOMP_CPU_AFFINITY=5" in the code to trigger the MKL internal FFT threads to spawn? If the simple case can work, then I'm all set.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Bonnie&lt;BR /&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 09 Apr 2009 00:53:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-do-you-spread-FFTs-across-CPUs/m-p/864147#M7731</guid>
      <dc:creator>bonniegb</dc:creator>
      <dc:date>2009-04-09T00:53:56Z</dc:date>
    </item>
  </channel>
</rss>

