<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The matrices are of rank 3  in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119589#M24881</link>
    <description>&lt;P&gt;The matrices are of rank 3 (according to Fortran terminology).&amp;nbsp; I believe MKL may allocate a temporary working matrix, which would prevent the coprocessor from using all of the on-board memory for your matrices and the MPSS and offloaded code, even if using the beta==0 option to suppress downloading the output C matrix of dgemm.&amp;nbsp; The benchmark quotations should state what size was found to give the quoted Gflops rating, and you wouldn't expect to be able to go much beyond that.&lt;/P&gt;</description>
    <pubDate>Mon, 09 May 2016 15:13:54 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2016-05-09T15:13:54Z</dc:date>
    <item>
      <title>Reproducing Xeon Phi Linpack (GEMM) results</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119586#M24878</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;

&lt;P&gt;I am trying to reproduce the Matrix Multiply results presented in the following website and I am not getting the same results.&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-linpack-stream.html"&gt;http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-linpack-stream.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Attached is the modified file from the I am starting from the code that comes with the MKL library (under: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl\examples) with no buffer reuse and doing initially single precision computations.&lt;/P&gt;

&lt;P&gt;Does anyone know if this is the code used for the benchmark or if there is a specific linpack library that I should be using, like the one found here:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite"&gt;https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The Xeon Phi Model I am using is the 7200P with 61 cores and 16GB RAM.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Also, it curious that at 30000 rank matrices (~10.1 GB for the three matrices) the MIC reserves the memory (checking with the micsmc and ssh-ing into the MIC and using the top command) but performs no computations and it seems to hang.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Best regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;David&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 19:27:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119586#M24878</guid>
      <dc:creator>David_F_8</dc:creator>
      <dc:date>2016-05-06T19:27:47Z</dc:date>
    </item>
    <item>
      <title>Hi David, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119587#M24879</link>
      <description>&lt;P&gt;Hi David,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Yes, when talking about the Linpack Performance, &amp;nbsp;we usually mean the Linpack ,which can be download from&amp;nbsp;&lt;A href="https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite" style="font-size: 12px; line-height: 18px;"&gt;https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite&lt;/A&gt;&amp;nbsp; and you can find them under MKL install folders.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;You mentioned ,&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;nbsp;30000 rank matrices (~10.1 GB for the three matrices) the MIC reserves the memory (checking with the micsmc and ssh-ing into the MIC and using the top command) but performs no computations and it seems to hang. &amp;nbsp;do you mean the linpack or the &amp;nbsp;examples? &amp;nbsp;anyway, if it doesn't works, you may try smaller and see if it works.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;BR /&gt;
	Ying H.&lt;/P&gt;

&lt;P&gt;Intel MKL Support&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 01:23:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119587#M24879</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-05-09T01:23:59Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119588#M24880</link>
      <description>&lt;P&gt;&lt;SPAN style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; background-color: rgb(255, 255, 255);"&gt;Hello Ying,&lt;/SPAN&gt;&lt;/P&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;When I mentioned the 30000 rank matrices I was referring to the Compiler Assisted offload code found in the examples that come with the MKL library (not the automatic offload ones).&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;I am running the "sgemm.c" program on an Intel Xeon Phi 7200P (with 16GB of RAM) and after reaching this matrix size it hangs.&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;I experience this also before the also 25000 which at double precision yields ~15GB, still within the memory capacity of the MIC. The memory consumption mentioned is taking into account that the GEMM operation uses Doubles and 3 matrices of the same rank.&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;This I though was curious since I am still not occupying all the memory in the MIC, and the OS and other related process take up around 300K in the MIC.&lt;/DIV&gt;

&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV class="gmail_extra" style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;Best regards,&lt;/DIV&gt;

&lt;DIV class="gmail_extra" style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV class="gmail_extra" style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"&gt;David Fernandez&lt;/DIV&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 14:49:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119588#M24880</guid>
      <dc:creator>David_F_8</dc:creator>
      <dc:date>2016-05-09T14:49:17Z</dc:date>
    </item>
    <item>
      <title>The matrices are of rank 3</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119589#M24881</link>
      <description>&lt;P&gt;The matrices are of rank 3 (according to Fortran terminology).&amp;nbsp; I believe MKL may allocate a temporary working matrix, which would prevent the coprocessor from using all of the on-board memory for your matrices and the MPSS and offloaded code, even if using the beta==0 option to suppress downloading the output C matrix of dgemm.&amp;nbsp; The benchmark quotations should state what size was found to give the quoted Gflops rating, and you wouldn't expect to be able to go much beyond that.&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 15:13:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119589#M24881</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-05-09T15:13:54Z</dc:date>
    </item>
    <item>
      <title>Hello Tim,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119590#M24882</link>
      <description>&lt;P&gt;Hello Tim,&lt;/P&gt;

&lt;P&gt;So if I understand correctly you believe that MKL is allocating temporary matrix space, which sounds reasonable, even though I would have thought that MKL would be doing some sort of blocking on the matrices to overlap computation/communication times and thus would require only small buffers inside the MIC (granted that the matrices might eventually be stored completely in the MIC, hence the memory allocation procedure).&lt;/P&gt;

&lt;P&gt;Your observations brings me back to my original question in trying to reproduce the Matrix-Matrix multiplication published in the following webpage:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-linpack-stream.html"&gt;http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-linpack-stream.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The matrices there go up to&amp;nbsp;&amp;nbsp;(43072 x 43072) which comes up to ~14GB in double precision, that made me think that the MIC would be able to hold my 25Kx25K matrices.&lt;/P&gt;

&lt;P&gt;Would it be possible to see the Linpack code used to generate these results (the SGEMM &amp;amp; DGEMM)?&lt;/P&gt;

&lt;P&gt;BTW, my beta is not 0, so I always assume that there will be an update and need to load onto the MIC the three matrices.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;David&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 15:24:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119590#M24882</guid>
      <dc:creator>David_F_8</dc:creator>
      <dc:date>2016-05-09T15:24:59Z</dc:date>
    </item>
    <item>
      <title>Hello all, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119591#M24883</link>
      <description>&lt;P&gt;Hello all,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I have been putting some numbers on memory consumption and I realize they are disorganized, so here is a more organized version:&lt;/P&gt;

&lt;P&gt;For Matrix Rank: 25000 (Double precision) &amp;lt;- this fails&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Memory 1 matrix: 25000^2 * 8 / 1e9 = ~5GB&lt;/LI&gt;
	&lt;LI&gt;Memory 3 matrices (required in DGEMM ) = ~15GB&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Of course, then the 30Kx30K (~7.2GB) matrices should also fail since it would involve around 21.6GB of memory which exceeds the MIC memory. This is the reason why I thought that the Linpack (MKL based?) version was doing some kind of data blocking since the reported size of&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;(43072 x 43072) would not fit as is in MIC memory.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Bests,&lt;/P&gt;

&lt;P&gt;David&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 16:26:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119591#M24883</guid>
      <dc:creator>David_F_8</dc:creator>
      <dc:date>2016-05-09T16:26:30Z</dc:date>
    </item>
    <item>
      <title>The only viewable relevant</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119592#M24884</link>
      <description>&lt;P&gt;The only viewable relevant source code is the public BLAS, e.g. on netlib.org.&amp;nbsp; Intel holds their own modifications proprietary, probably including data blocking not in the reference source, and translation to C++ with simd intrinsics.&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 21:46:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Reproducing-Xeon-Phi-Linpack-GEMM-results/m-p/1119592#M24884</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-05-09T21:46:02Z</dc:date>
    </item>
  </channel>
</rss>

