<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Overwritting input matrix in gemm in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Overwritting-input-matrix-in-gemm/m-p/1680457#M37036</link>
    <description>&lt;P&gt;Hello, I am trying to use gemm to make some complex computations, and sometimes, if the result matrix is one of the input matrices, it gives an incorrect result.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I haven't seen any mention of this in the documentation:&lt;/P&gt;&lt;LI-CODE lang="cpp"&gt;// Linked to Intel MKL
#include &amp;lt;cblas.h&amp;gt;

// Lets say we have 3 square matrices of size `size`
// This works
cblas_tgemm&amp;lt;T&amp;gt;(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, C, size);

// This also seems to work
cblas_tgemm&amp;lt;T&amp;gt;(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, A, size);

// But this gives incorrect results
cblas_tgemm&amp;lt;T&amp;gt;(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, B, size);&lt;/LI-CODE&gt;</description>
    <pubDate>Fri, 04 Apr 2025 14:43:00 GMT</pubDate>
    <dc:creator>ddavobsc</dc:creator>
    <dc:date>2025-04-04T14:43:00Z</dc:date>
    <item>
      <title>Overwritting input matrix in gemm</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Overwritting-input-matrix-in-gemm/m-p/1680457#M37036</link>
      <description>&lt;P&gt;Hello, I am trying to use gemm to make some complex computations, and sometimes, if the result matrix is one of the input matrices, it gives an incorrect result.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I haven't seen any mention of this in the documentation:&lt;/P&gt;&lt;LI-CODE lang="cpp"&gt;// Linked to Intel MKL
#include &amp;lt;cblas.h&amp;gt;

// Lets say we have 3 square matrices of size `size`
// This works
cblas_tgemm&amp;lt;T&amp;gt;(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, C, size);

// This also seems to work
cblas_tgemm&amp;lt;T&amp;gt;(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, A, size);

// But this gives incorrect results
cblas_tgemm&amp;lt;T&amp;gt;(layout, CblasNoTrans, CblasNoTrans, size, size, size, 1.0, A, size, B, size, 0.0, B, size);&lt;/LI-CODE&gt;</description>
      <pubDate>Fri, 04 Apr 2025 14:43:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Overwritting-input-matrix-in-gemm/m-p/1680457#M37036</guid>
      <dc:creator>ddavobsc</dc:creator>
      <dc:date>2025-04-04T14:43:00Z</dc:date>
    </item>
    <item>
      <title>Re: Overwritting input matrix in gemm</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Overwritting-input-matrix-in-gemm/m-p/1682817#M37070</link>
      <description>&lt;P&gt;To start off, I want to mention that this is not how the function is intended to be used so any incorrect results are to be expected. Using cblas_?gemm this way is not supported. The output matrix should not be the same as the input matrix. This is because they all can technically have different dimensions. A is an m by k matrix, B is a k by n matrix, and C is an m by n matrix. The only situation where using the input matrix to store the output might technically pass is when they all are the same size. Even still, in my investigations I could not find a situation where [A = A*B] worked but [B = A*B] didn't. Either both work, or neither do. Here's an example:&lt;/P&gt;
&lt;TABLE border="1" width="100%"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;C &amp;lt;- 1.0 * ( A * B ) + 0.0 * C&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;A &amp;lt;- 1.0 * ( A * B ) + 0.0 * A&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;B &amp;lt;- 1.0 * ( A * B ) + 0.0 * B&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%"&gt;
&lt;P&gt;Amxk&lt;BR /&gt;0.840188 0.394383 0.783099 0.79844&lt;BR /&gt;0.911647 0.197551 0.335223 0.76823&lt;BR /&gt;0.277775 0.55397 0.477397 0.628871&lt;BR /&gt;0.364784 0.513401 0.95223 0.916195&lt;/P&gt;
&lt;P&gt;Bkxn&lt;BR /&gt;0.635712 0.717297 0.141603 0.606969&lt;BR /&gt;0.0163006 0.242887 0.137232 0.804177&lt;BR /&gt;0.156679 0.400944 0.12979 0.108809&lt;BR /&gt;0.998925 0.218257 0.512932 0.839112&lt;/P&gt;
&lt;P&gt;Cmxn&lt;BR /&gt;1.46082 1.1867 0.684279 1.58231&lt;BR /&gt;1.40269 1.00398 0.59376 1.39331&lt;BR /&gt;0.888607 0.662464 0.499886 1.19373&lt;BR /&gt;1.30467 0.968114 0.715646 1.50668&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;
&lt;P&gt;Amxk&lt;BR /&gt;0.840188 0.394383 0.783099 0.79844&lt;BR /&gt;0.911647 0.197551 0.335223 0.76823&lt;BR /&gt;0.277775 0.55397 0.477397 0.628871&lt;BR /&gt;0.364784 0.513401 0.95223 0.916195&lt;/P&gt;
&lt;P&gt;Bkxn&lt;BR /&gt;0.635712 0.717297 0.141603 0.606969&lt;BR /&gt;0.0163006 0.242887 0.137232 0.804177&lt;BR /&gt;0.156679 0.400944 0.12979 0.108809&lt;BR /&gt;0.998925 0.218257 0.512932 0.839112&lt;/P&gt;
&lt;P&gt;A (result)&lt;BR /&gt;1.46082 1.1867 0.684279 1.58231&lt;BR /&gt;1.40269 1.00398 0.59376 1.39331&lt;BR /&gt;0.888607 0.662464 0.499886 1.19373&lt;BR /&gt;1.30467 0.968114 0.715646 1.50668&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;
&lt;P&gt;Amxk&lt;BR /&gt;0.840188 0.394383 0.783099 0.79844&lt;BR /&gt;0.911647 0.197551 0.335223 0.76823&lt;BR /&gt;0.277775 0.55397 0.477397 0.628871&lt;BR /&gt;0.364784 0.513401 0.95223 0.916195&lt;/P&gt;
&lt;P&gt;Bkxn&lt;BR /&gt;0.635712 0.717297 0.141603 0.606969&lt;BR /&gt;0.0163006 0.242887 0.137232 0.804177&lt;BR /&gt;0.156679 0.400944 0.12979 0.108809&lt;BR /&gt;0.998925 0.218257 0.512932 0.839112&lt;/P&gt;
&lt;P&gt;B (result)&lt;BR /&gt;1.46082 1.1867 0.684279 1.58231&lt;BR /&gt;1.40269 1.00398 0.59376 1.39331&lt;BR /&gt;0.888607 0.662464 0.499886 1.19373&lt;BR /&gt;1.30467 0.968114 0.715646 1.50668&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&lt;BR /&gt;It made no difference whether the result was stored in A or B. It worked either way with correct results. While this may have succeeded for this small example, Intel MKL does not claim to support this behavior. Once the matrices get larger in size (like 256x256 on my machine) the result becomes incorrect.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Apr 2025 16:03:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Overwritting-input-matrix-in-gemm/m-p/1682817#M37070</guid>
      <dc:creator>Ethan_F_Intel</dc:creator>
      <dc:date>2025-04-14T16:03:50Z</dc:date>
    </item>
  </channel>
</rss>

