<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MATMUL causing stack overflow in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171533#M146033</link>
    <description>&lt;P&gt;I noticed that for large matrices, MATMUL is crashing with a stack overflow.&amp;nbsp; I can fix this with /heap-arrays0. &amp;nbsp; The program does not crash when calling dgemm from MKL.&amp;nbsp; I ran some tests, and the results from MATMUL and dgemm are identical.&amp;nbsp; However MATMUL needs a large stack, and dgemm doesn't.&amp;nbsp;&amp;nbsp; Is this the correct behavior, or is there a bug somewhere?&lt;/P&gt;

&lt;P&gt;Roman&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 11 Dec 2017 23:37:46 GMT</pubDate>
    <dc:creator>Roman1</dc:creator>
    <dc:date>2017-12-11T23:37:46Z</dc:date>
    <item>
      <title>MATMUL causing stack overflow</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171533#M146033</link>
      <description>&lt;P&gt;I noticed that for large matrices, MATMUL is crashing with a stack overflow.&amp;nbsp; I can fix this with /heap-arrays0. &amp;nbsp; The program does not crash when calling dgemm from MKL.&amp;nbsp; I ran some tests, and the results from MATMUL and dgemm are identical.&amp;nbsp; However MATMUL needs a large stack, and dgemm doesn't.&amp;nbsp;&amp;nbsp; Is this the correct behavior, or is there a bug somewhere?&lt;/P&gt;

&lt;P&gt;Roman&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 23:37:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171533#M146033</guid>
      <dc:creator>Roman1</dc:creator>
      <dc:date>2017-12-11T23:37:46Z</dc:date>
    </item>
    <item>
      <title>Most of the time, the MATMUL</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171534#M146034</link>
      <description>&lt;P&gt;Most of the time, the MATMUL intrinsic is implemented by in-line code.&amp;nbsp; That is, the actual instructions to multiply each element of the matrices is generated by the compiler (including the loops to go from element to element, etc).&amp;nbsp; That's different than a call to dgemm, where it is a simple routine call.&lt;/P&gt;

&lt;P&gt;That said, because using /heap-arrays resolves the stack overflow, I'm going to guess that&amp;nbsp; the compiler cannot detect that there won't be overlap between the result variable and the two operands, and&amp;nbsp;for safety reasons&amp;nbsp;it creates a temporary array.&amp;nbsp; By default, this is on the stack.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; To answer your question, I would characterize this as "expected behavior", and not likely a bug.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; --Lorri&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 13:31:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171534#M146034</guid>
      <dc:creator>Lorri_M_Intel</dc:creator>
      <dc:date>2017-12-12T13:31:10Z</dc:date>
    </item>
    <item>
      <title>I ran a test, and the</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171535#M146035</link>
      <description>&lt;P&gt;I ran a test, and the performance of dgemm is slightly better than matmul.&amp;nbsp; Based on your reply, this might be because there is an extra step where values are copied from the temporary memory to the result variable.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Dec 2017 00:03:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171535#M146035</guid>
      <dc:creator>Roman1</dc:creator>
      <dc:date>2017-12-13T00:03:07Z</dc:date>
    </item>
    <item>
      <title>In the case where the matmul</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171536#M146036</link>
      <description>In the case where the matmul result is stored explicitly 
A = matmul(B,C)
Some compilers are able to avoid allocation of a temporary. 
Usually, the difference in performance would come mostly from efficiency of cache usage. In the more general case,
A = b*matmul(C,D) + E
It seems unlikely for a compiler to optimize the temporary away.</description>
      <pubDate>Wed, 13 Dec 2017 10:08:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/MATMUL-causing-stack-overflow/m-p/1171536#M146036</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-12-13T10:08:08Z</dc:date>
    </item>
  </channel>
</rss>

