<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Quote:Fiona Z. (Intel) wrote: in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087235#M23037</link>
    <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Fiona Z. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi YangDong,&lt;/P&gt;

&lt;P&gt;I am afraid you are not using FFT cluster computing functions &amp;amp; descriptor config function, that data would not be distributed correctly to calculate. For your code implement you probably need to use&amp;nbsp;'DftiComputeForwardDM' and 'DftiSetValueDM'.&lt;/P&gt;

&lt;P&gt;Another point is, I am not sure if you are thread safe or not. If each node could modify the time calculation, the time you print is not actually for main node, but for all node calculation time. I recommend to use MPI interface (mpi_Wtime) to calculate time usage.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best regards,&lt;BR /&gt;
	Fiona&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;hi Fiona,&lt;/P&gt;

&lt;P&gt;thanks for your warning.&lt;/P&gt;

&lt;P&gt;Single node has private hand and memory space. If every node run independent at the same time, is it thread safe?&lt;/P&gt;</description>
    <pubDate>Fri, 28 Apr 2017 15:05:29 GMT</pubDate>
    <dc:creator>杨_栋_</dc:creator>
    <dc:date>2017-04-28T15:05:29Z</dc:date>
    <item>
      <title>Why MPI impact the speed of MKL's DFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087230#M23032</link>
      <description>&lt;P&gt;My code:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;// -*- C++ -*-

# include &amp;lt;cmath&amp;gt;
# include &amp;lt;ctime&amp;gt;
# include &amp;lt;cstring&amp;gt;
# include &amp;lt;cstdio&amp;gt;

# include "mkl.h"

int main (int argc, char * argv[])
{
  MKL_LONG D[2] = {SIZE, SIZE};
  MKL_LONG C = COUNT;
  MKL_LONG ST[3] = {0, (D[1] * sizeof(double) + 63) / 64 * (64 / sizeof(double)), 1};
  MKL_LONG DI = D[0] * ST[1];
  MKL_LONG SI = D[0] * ST[1];
  double SC = 1.0 / std::sqrt((double)SI);
  struct timespec BE, EN;
  
  double*const Efft_r = (double*)_mm_malloc(sizeof(double) * SI * C * 2, 64);
  memset(Efft_r, 0, sizeof(double) * SI  * C * 2);
  double*const Efft_i = Efft_r + SI * C;

  Efft_r[0] = 1.0;

  clock_gettime (CLOCK_REALTIME, &amp;amp;BE);
  for (int i=0; i&amp;lt;LOOP; ++i)
    {
      MKL_LONG status;
      DFTI_DESCRIPTOR_HANDLE hand;
      DftiCreateDescriptor(&amp;amp;hand, DFTI_DOUBLE, DFTI_COMPLEX, 2, D);
      DftiSetValue(hand, DFTI_INPUT_STRIDES, ST);
      DftiSetValue(hand, DFTI_OUTPUT_STRIDES, ST);
      DftiSetValue(hand, DFTI_NUMBER_OF_TRANSFORMS, C);
      DftiSetValue(hand, DFTI_INPUT_DISTANCE, DI);
      DftiSetValue(hand, DFTI_COMPLEX_STORAGE, DFTI_REAL_REAL);
      DftiSetValue(hand, DFTI_FORWARD_SCALE, SC);
      DftiSetValue(hand, DFTI_BACKWARD_SCALE, SC);
      DftiSetValue(hand, DFTI_THREAD_LIMIT, 1);
      DftiSetValue(hand, DFTI_NUMBER_OF_USER_THREADS, 1);
      DftiCommitDescriptor(hand);
      __assume_aligned(Efft_r, 64);
      __assume_aligned(Efft_i, 64);
      DftiComputeForward(hand, Efft_r, Efft_i);
      DftiFreeDescriptor(&amp;amp;hand);
    }
  clock_gettime (CLOCK_REALTIME, &amp;amp;EN);
  printf("DFTI_COMPLEX_STORAGE: DFTI_REAL_REAL\nLOOP:   \t%d\nSIZE:   \t%d X %d\nSTRIDES:\t%d %d %d\nNUMBER: \t%d\nDISTANCE:\t%d\n\t\t\t\t%.9fs\n",
	 LOOP,
	 D[0], D[1],
	 ST[0], ST[1], ST[2],
	 C,
	 DI,
	 double(EN.tv_sec-BE.tv_sec)+double(EN.tv_nsec-BE.tv_nsec)/1e9);
  _mm_free(Efft_r);

  return 0;
}&lt;/PRE&gt;

&lt;P&gt;This code was compiled by icpc with flag "-mkl DSIZE=4096 -DLOOP=1 -DCOUNT=3".&lt;/P&gt;

&lt;P&gt;When I run this program without MPI, the output is below:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;$ ./a.out
DFTI_COMPLEX_STORAGE: DFTI_REAL_REAL
LOOP:   	1
SIZE:   	4096 X 4096
STRIDES:	0 4096 1
NUMBER: 	3
DISTANCE:	16777216
				0.322017125s&lt;/PRE&gt;

&lt;P&gt;When I run the same program with MPI, the output is below:&lt;/P&gt;

&lt;PRE class="brush:bash;"&gt;$ mpirun -n 1 ./a.out
DFTI_COMPLEX_STORAGE: DFTI_REAL_REAL
LOOP:   	1
SIZE:   	4096 X 4096
STRIDES:	0 4096 1
NUMBER: 	3
DISTANCE:	16777216
				1.606980538s&lt;/PRE&gt;

&lt;P&gt;The program without MPI runs much faster than with MPI. I have tried different value of SIZE, but the results are alike.&lt;/P&gt;

&lt;P&gt;I have not known why. If I must use MPI, is there any way to keep the speed of MKL?&lt;/P&gt;</description>
      <pubDate>Wed, 19 Apr 2017 16:08:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087230#M23032</guid>
      <dc:creator>杨_栋_</dc:creator>
      <dc:date>2017-04-19T16:08:42Z</dc:date>
    </item>
    <item>
      <title>We are investigating. We will</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087231#M23033</link>
      <description>&lt;P&gt;We are investigating. We will get back to you.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Apr 2017 23:03:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087231#M23033</guid>
      <dc:creator>Jing_Xu</dc:creator>
      <dc:date>2017-04-20T23:03:35Z</dc:date>
    </item>
    <item>
      <title>Did you use https://software</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087232#M23034</link>
      <description>&lt;P&gt;Did you use&amp;nbsp;https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor to get the compling and linking switches?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Apr 2017 00:26:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087232#M23034</guid>
      <dc:creator>Jing_Xu</dc:creator>
      <dc:date>2017-04-21T00:26:30Z</dc:date>
    </item>
    <item>
      <title>Hi YangDong,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087233#M23035</link>
      <description>&lt;P&gt;Hi YangDong,&lt;/P&gt;

&lt;P&gt;I am afraid you are not using FFT cluster computing functions &amp;amp; descriptor config function, that data would not be distributed correctly to calculate. For your code implement you probably need to use&amp;nbsp;'DftiComputeForwardDM' and 'DftiSetValueDM'.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Another point is, I am not sure if you are thread safe or not. If each node could modify the time calculation, the time you print is not actually for main node, but for all node calculation time. I recommend to use MPI interface (mpi_Wtime) to calculate time usage.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Best regards,&lt;BR /&gt;
	Fiona&lt;/SPAN&gt;&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Apr 2017 01:34:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087233#M23035</guid>
      <dc:creator>Zhen_Z_Intel</dc:creator>
      <dc:date>2017-04-21T01:34:36Z</dc:date>
    </item>
    <item>
      <title>Quote:Jing X. (Intel) wrote:</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087234#M23036</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Jing X. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Did you use&amp;nbsp;&lt;A href="https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor"&gt;https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor&lt;/A&gt; to get the compling and linking switches?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thank you!&lt;/P&gt;

&lt;P&gt;I have solved this problem. I added MPI_Init() and MPI_Finalize() to the code. After the code was compiled by mpiicpc, the program without MPI runs as fast as the program with MPI.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Apr 2017 14:55:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087234#M23036</guid>
      <dc:creator>杨_栋_</dc:creator>
      <dc:date>2017-04-28T14:55:38Z</dc:date>
    </item>
    <item>
      <title>Quote:Fiona Z. (Intel) wrote:</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087235#M23037</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Fiona Z. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi YangDong,&lt;/P&gt;

&lt;P&gt;I am afraid you are not using FFT cluster computing functions &amp;amp; descriptor config function, that data would not be distributed correctly to calculate. For your code implement you probably need to use&amp;nbsp;'DftiComputeForwardDM' and 'DftiSetValueDM'.&lt;/P&gt;

&lt;P&gt;Another point is, I am not sure if you are thread safe or not. If each node could modify the time calculation, the time you print is not actually for main node, but for all node calculation time. I recommend to use MPI interface (mpi_Wtime) to calculate time usage.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best regards,&lt;BR /&gt;
	Fiona&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;hi Fiona,&lt;/P&gt;

&lt;P&gt;thanks for your warning.&lt;/P&gt;

&lt;P&gt;Single node has private hand and memory space. If every node run independent at the same time, is it thread safe?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Apr 2017 15:05:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087235#M23037</guid>
      <dc:creator>杨_栋_</dc:creator>
      <dc:date>2017-04-28T15:05:29Z</dc:date>
    </item>
    <item>
      <title>Hi YongDong,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087236#M23038</link>
      <description>&lt;P&gt;Hi YongDong,&lt;/P&gt;

&lt;P&gt;Your FFT code looks&amp;nbsp;fine and&amp;nbsp;do &amp;nbsp;2D complex to complex FFT on single machine.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The program without MPI runs much faster than with MPI because that MKL FFT is multi-threaded by OpenMP,&amp;nbsp; when you use mpirun to invoke the MKL FFT, it will ignore the OpenMP threads by default.&amp;nbsp; So for same performance of&amp;nbsp; the program without MPI,&lt;/P&gt;

&lt;P&gt;you&amp;nbsp;&amp;nbsp;may try&lt;/P&gt;

&lt;P&gt;&amp;gt; export &amp;nbsp;OMP_NUM_THREADS=xx&amp;nbsp; (your number of physical cores)&lt;/P&gt;

&lt;P&gt;&amp;gt; then &lt;FONT face="Courier New"&gt;mpirun -n 1 ./a.out&lt;/FONT&gt;&lt;/P&gt;

&lt;P&gt;&lt;FONT face="Courier New"&gt;Here is what i run: &lt;/FONT&gt;&lt;/P&gt;

&lt;P&gt;[yhu5_new@hsw-ep01 FFT]$ export OMP_NUM_THREADS=36&lt;BR /&gt;
	[yhu5_new@hsw-ep01 FFT]$ ./a.out&lt;BR /&gt;
	DFTI_COMPLEX_STORAGE: DFTI_REAL_REAL&lt;BR /&gt;
	LOOP:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;
	SIZE:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4096 X 4096&lt;BR /&gt;
	STRIDES:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 4096 1&lt;BR /&gt;
	NUMBER:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;
	DISTANCE:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16777216&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.141292946s&lt;BR /&gt;
	[yhu5_new@hsw-ep01 FFT]$&amp;nbsp; mpirun -n 1 ./a.out&lt;BR /&gt;
	DFTI_COMPLEX_STORAGE: DFTI_REAL_REAL&lt;BR /&gt;
	LOOP:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;BR /&gt;
	SIZE:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4096 X 4096&lt;BR /&gt;
	STRIDES:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 4096 1&lt;BR /&gt;
	NUMBER:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR /&gt;
	DISTANCE:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16777216&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.145391448s&lt;/P&gt;

&lt;P&gt;&amp;nbsp;As i understand, your code is no mpi program. so &amp;nbsp;it is not needed to run by mpirun actually.&amp;nbsp;&amp;nbsp;For&amp;nbsp;performance tips, you&amp;nbsp;may refer to MKL user guide or&amp;nbsp; the article &amp;nbsp;&lt;A href="https://software.intel.com/en-us/articles/tuning-the-intel-mkl-dft-functions-performance-on-intel-xeon-phi-coprocessors"&gt;https://software.intel.com/en-us/articles/tuning-the-intel-mkl-dft-functions-performance-on-intel-xeon-phi-coprocessors&lt;/A&gt;&amp;nbsp; (which is for Xeon phi, but the conceptions are same for other processors)&lt;/P&gt;

&lt;P&gt;If you'd like to use MPI , then you need MPI programing, and call Cluster MKL FFT with huge FFT size on multi-nodes.&amp;nbsp; and your may find the cluster FFT sample code under MKL install folder: examples_cluster_c.tgz&lt;BR /&gt;
	Unzip it and see cdftc/source/dm_complex_2d_double_ex1.c&lt;/P&gt;

&lt;P&gt;Please refer MKL user guide for more details. &lt;A href="https://software.intel.com/en-us/mkl-macos-developer-guide-linking-with-intel-mkl-cluster-software"&gt;https://software.intel.com/en-us/mkl-macos-developer-guide-linking-with-intel-mkl-cluster-software&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;</description>
      <pubDate>Mon, 15 May 2017 07:40:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087236#M23038</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2017-05-15T07:40:23Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...If I must use MPI, is</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087237#M23039</link>
      <description>&amp;gt;&amp;gt;...If I must use MPI, is there any way to keep the speed of MKL?...

To improve performance of processing you can consider:

- Use &lt;STRONG&gt;scatter&lt;/STRONG&gt; attribute for &lt;STRONG&gt;KMP_AFFINITY&lt;/STRONG&gt; environment variable

- Place data sets into &lt;STRONG&gt;MCDRAM&lt;/STRONG&gt; memory instead of &lt;STRONG&gt;DDR4&lt;/STRONG&gt; if a &lt;STRONG&gt;KNL&lt;/STRONG&gt; system is used ( for Flat or Hybrid MCDRAM modes ). A speed up could be significant and here are two examples:</description>
      <pubDate>Wed, 17 May 2017 20:11:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087237#M23039</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2017-05-17T20:11:24Z</dc:date>
    </item>
    <item>
      <title>/////////////////////////////</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087238#M23040</link>
      <description>&lt;PRE class="brush:cpp;"&gt;///////////////////////////////////////////////////////////////////////////////
// 16384 x 16384 - Processing using DDR4

&amp;nbsp;Strassen HBI
&amp;nbsp;Matrix Size&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 16384 x 16384
&amp;nbsp;Matrix Size Threshold :&amp;nbsp; 8192 x&amp;nbsp; 8192
&amp;nbsp;Matrix Partitions&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8
&amp;nbsp;Degree of Recursion&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1
&amp;nbsp;Result Sets Reflection: N/A
&amp;nbsp;Calculating...
&amp;nbsp;Strassen HBI - Pass 01 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.97700 secs
&amp;nbsp;Strassen HBI - Pass 02 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.71200 secs
&amp;nbsp;Strassen HBI - Pass 03 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.30400 secs
&amp;nbsp;Strassen HBI - Pass 04 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.28600 secs
&amp;nbsp;Strassen HBI - Pass 05 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.35500 secs
&amp;nbsp;ALGORITHM_STRASSENHBI - Passed

///////////////////////////////////////////////////////////////////////////////
// 16384 x 16384 - Processing using MCDRAM

&amp;nbsp;Strassen HBI
&amp;nbsp;Matrix Size&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 16384 x 16384
&amp;nbsp;Matrix Size Threshold :&amp;nbsp; 8192 x&amp;nbsp; 8192
&amp;nbsp;Matrix Partitions&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8
&amp;nbsp;Degree of Recursion&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1
&amp;nbsp;Result Sets Reflection: N/A
&amp;nbsp;Calculating...
&amp;nbsp;Strassen HBI - Pass 01 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.88600 secs
&amp;nbsp;Strassen HBI - Pass 02 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.27700 secs
&amp;nbsp;Strassen HBI - Pass 05 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.24900 secs
&amp;nbsp;Strassen HBI - Pass 03 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.24000 secs
&amp;nbsp;Strassen HBI - Pass 04 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.24800 secs
&amp;nbsp;ALGORITHM_STRASSENHBI - Passed

&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 May 2017 20:12:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087238#M23040</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2017-05-17T20:12:25Z</dc:date>
    </item>
    <item>
      <title>/////////////////////////////</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087239#M23041</link>
      <description>&lt;PRE class="brush:cpp;"&gt;///////////////////////////////////////////////////////////////////////////////
// 16384 x 16384 - Processing using DDR4

&amp;nbsp;Strassen HBC
&amp;nbsp;Matrix Size&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 16384 x 16384
&amp;nbsp;Matrix Size Threshold :&amp;nbsp; 8192 x&amp;nbsp; 8192
&amp;nbsp;Matrix Partitions&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8
&amp;nbsp;Degree of Recursion&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1
&amp;nbsp;Result Sets Reflection: Disabled
&amp;nbsp;Calculating...
&amp;nbsp;Strassen HBC - Pass 01 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.92900 secs
&amp;nbsp;Strassen HBC - Pass 02 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.80300 secs
&amp;nbsp;Strassen HBC - Pass 03 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.76300 secs
&amp;nbsp;Strassen HBC - Pass 04 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.84800 secs
&amp;nbsp;Strassen HBC - Pass 05 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6.78500 secs
&amp;nbsp;ALGORITHM_STRASSENHBC - 1 - Passed

///////////////////////////////////////////////////////////////////////////////
// 16384 x 16384 - Processing using MCDRAM

&amp;nbsp;Strassen HBC
&amp;nbsp;Matrix Size&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 16384 x 16384
&amp;nbsp;Matrix Size Threshold :&amp;nbsp; 8192 x&amp;nbsp; 8192
&amp;nbsp;Matrix Partitions&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8
&amp;nbsp;Degree of Recursion&amp;nbsp;&amp;nbsp; :&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1
&amp;nbsp;Result Sets Reflection: Disabled
&amp;nbsp;Calculating...
&amp;nbsp;Strassen HBC - Pass 01 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5.03100 secs
&amp;nbsp;Strassen HBC - Pass 03 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.96100 secs
&amp;nbsp;Strassen HBC - Pass 05 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.94200 secs
&amp;nbsp;Strassen HBC - Pass 03 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.96200 secs
&amp;nbsp;Strassen HBC - Pass 04 - Completed:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4.95400 secs
&amp;nbsp;ALGORITHM_STRASSENHBC - 1 - Passed
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 May 2017 20:13:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Why-MPI-impact-the-speed-of-MKL-s-DFT/m-p/1087239#M23041</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2017-05-17T20:13:11Z</dc:date>
    </item>
  </channel>
</rss>

