<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 2D FFT performance on cluster in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935450#M2615</link>
    <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Hello Andrei,&lt;/P&gt;
&lt;P&gt;The Intel Cluster Math Kernel Library 8.1can do distributed-memory, parallel FFT's. The following website has more information: &lt;A href="http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/266852.htm" target="_blank"&gt;http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/266852.htm&lt;/A&gt;. You can download Cluster MKL and get a 30-day license from this site too.&lt;/P&gt;
&lt;P&gt;Please share your performance results if you try Cluster MKL.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;/P&gt;
&lt;P&gt;Henry&lt;/P&gt;</description>
    <pubDate>Wed, 12 Apr 2006 20:18:09 GMT</pubDate>
    <dc:creator>Henry_G_Intel</dc:creator>
    <dc:date>2006-04-12T20:18:09Z</dc:date>
    <item>
      <title>2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935449#M2614</link>
      <description>&lt;DIV&gt;Hi there,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I need to take an advantage of calculation of 2D FFT on the grid, say, 512x512 using the cluster.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Algorithm consists of 3 steps:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;1. 1D FFT's on the raws of the matrix&lt;/DIV&gt;&lt;DIV&gt;2. Matrix transpose&lt;/DIV&gt;&lt;DIV&gt;3. Repetition of the step 1.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;One-processor tests show that the Step 2 (matrix transpose) is the most time consuming part of the algorithm. So, while parallelization of Step 1 is pretty straightforward on the cluster, the matrix transpose on the cluster becomes a bottleneck.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;There is a group of people from the East Coast who worked on the problem and presumably achieved linear growth of FFT performance on the cluster up to 16 processors, which is quite impressive.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The question is, if Intel cluster library can handle the problem of parallelization of 2D FFT on the cluster?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Andrei&lt;/DIV&gt;</description>
      <pubDate>Wed, 12 Apr 2006 15:32:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935449#M2614</guid>
      <dc:creator>andrei1</dc:creator>
      <dc:date>2006-04-12T15:32:08Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935450#M2615</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Hello Andrei,&lt;/P&gt;
&lt;P&gt;The Intel Cluster Math Kernel Library 8.1can do distributed-memory, parallel FFT's. The following website has more information: &lt;A href="http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/266852.htm" target="_blank"&gt;http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/266852.htm&lt;/A&gt;. You can download Cluster MKL and get a 30-day license from this site too.&lt;/P&gt;
&lt;P&gt;Please share your performance results if you try Cluster MKL.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;/P&gt;
&lt;P&gt;Henry&lt;/P&gt;</description>
      <pubDate>Wed, 12 Apr 2006 20:18:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935450#M2615</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2006-04-12T20:18:09Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935451#M2616</link>
      <description>Hi Henry,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I looked at the performance graphs in the link in your reply.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;How can one get 5 Gflops performance on 1.5 Ghz processor (I mean 1D FFT graph)?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Andrei&lt;P&gt;Message Edited by andrei@cox.net on &lt;SPAN class="date_text"&gt;04-12-2006&lt;/SPAN&gt;&lt;SPAN class="time_text"&gt;07:53 AM&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Apr 2006 21:51:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935451#M2616</guid>
      <dc:creator>andrei1</dc:creator>
      <dc:date>2006-04-12T21:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935452#M2617</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Hi Andrei,&lt;/P&gt;
&lt;P&gt;The Itanium 2processor can dofour floating-point operations per clock cycle. Therefore, the theoretical peak of a 1.5 GHz Itanium 2 is 6 GFLOPS.&lt;/P&gt;
&lt;P&gt;Henry&lt;/P&gt;</description>
      <pubDate>Wed, 12 Apr 2006 22:44:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935452#M2617</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2006-04-12T22:44:50Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935453#M2618</link>
      <description>Henry, thanks for the answer.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I think that different graphs for the cluster performance estimate would be useful -- acceleration factor versus number of processors used for, say, 512x512 FFT. In fact, they are common in literature.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;It is easy to understand from the existing graphs that such dependence is close to linear up to 4 processors. It is also clear that it will start to deviate from linear dependence at specific number of processors. Question -- how many and how fast?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Is there any place to take a look at such graphs?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks, Andrei</description>
      <pubDate>Wed, 12 Apr 2006 23:12:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935453#M2618</guid>
      <dc:creator>andrei1</dc:creator>
      <dc:date>2006-04-12T23:12:08Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935454#M2619</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Hi Andrei,&lt;/P&gt;
&lt;P&gt;I'm not aware ofany other graphs or published benchmarks of MKL DFT performance. I can't estimate the scalability of your calculation. However, a 512x512 FFT is considered a small calculation on a good workstation or server. I recommend that you measure the serialMKLperformance before investing any effort in a distributed-memory, parallel solution. Depending on your system, MKL can probably compute a 512x512 transform in less than a second.Ifso, adistributed-memory, parallel solution will be slower because of the communication overhead.&lt;/P&gt;
&lt;P&gt;Henry&lt;/P&gt;</description>
      <pubDate>Thu, 13 Apr 2006 01:10:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935454#M2619</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2006-04-13T01:10:49Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935455#M2620</link>
      <description>Hi Henry,&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thank you for your answers. In fact, 512x512 grid is just entry level&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;for my problem and I hope that system overhead will be significantly&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;lower for higher grid dimension.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Appreciate your cooperation, Andrei</description>
      <pubDate>Thu, 13 Apr 2006 21:10:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935455#M2620</guid>
      <dc:creator>andrei1</dc:creator>
      <dc:date>2006-04-13T21:10:51Z</dc:date>
    </item>
    <item>
      <title>Re: 2D FFT performance on cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935456#M2621</link>
      <description>&lt;DIV&gt;Regarding the 1.5GHz Itanium 2's peak performance of 6 GFlops, I would have thought that for short precision data, the peak would be higher through the use of whichever SIMD is actually present.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Tim&lt;/DIV&gt;</description>
      <pubDate>Wed, 26 Apr 2006 08:37:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/2D-FFT-performance-on-cluster/m-p/935456#M2621</guid>
      <dc:creator>tcrony70</dc:creator>
      <dc:date>2006-04-26T08:37:00Z</dc:date>
    </item>
  </channel>
</rss>

