<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I tested it on AVX512 setup - in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179009#M29179</link>
    <description>&lt;P&gt;I tested it on AVX512 setup - Xeon Platinum 8176 2.10Gz&lt;/P&gt;&lt;P&gt;I can't see any improvments that comes from the AVX512.&lt;/P&gt;&lt;P&gt;Should I expect for any improvement&amp;nbsp;against AVX2 on the above setup?&lt;/P&gt;&lt;P&gt;Can't find any info in release notes.&lt;/P&gt;&lt;P&gt;Elad&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 10 Jul 2019 06:47:06 GMT</pubDate>
    <dc:creator>Yosef__Elad</dc:creator>
    <dc:date>2019-07-10T06:47:06Z</dc:date>
    <item>
      <title>cgemm3m, cgemm_compact AND cgemm give poor results for small problem 24*64</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179003#M29173</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;I using sequential API and direct call to multilply matrices.&lt;/P&gt;&lt;P&gt;&amp;nbsp;C = 1*conj(A')*A&lt;/P&gt;&lt;P&gt;A is 64*24 and C is 24*24 both are complex&amp;nbsp;matrix (complex8).&lt;/P&gt;&lt;P&gt;I have arrays of matrices: A_ARR (filled with random values)&amp;nbsp;and C_ARR (filled with zeros) both array have 1000&amp;nbsp;matrices.&lt;/P&gt;&lt;P&gt;My application is pinned to sinlge core and to corresponding RAM by NUMA id.&lt;/P&gt;&lt;P&gt;build cmd: icc -c -g -ipo -ipp -Ofast -DMKL_DIRECT_SEQ -xCORE-AVX2 *.c&lt;/P&gt;&lt;P&gt;Setup is Xeon E5-2699A v4,&amp;nbsp; 64G ram on each numa&lt;/P&gt;&lt;P&gt;I run cblas_cgemm/cblas_cgemm3m/mkl_cgemm_compact in a loop over A_ARR and C_ARR (each time only 1 function) and I get really poor results (I'm measuring only the&amp;nbsp;matrices multiplication time)&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm aware to the MKL "warn-up" issue and running&amp;nbsp;cblas_cgemm in advance with measuring it time&lt;/P&gt;&lt;P&gt;cblas_cgemm(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &amp;amp;alpha, &amp;amp;A_ARR&lt;I&gt;, m, &amp;amp;A_ARR&lt;I&gt;, n, &amp;amp;beta, &amp;amp;C_ARR&lt;I&gt;, m)&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Gives-&amp;nbsp;AVG 6.5ms MAX 8.6ms MIN 6.3ms&lt;/P&gt;&lt;P&gt;cblas_cgemm3m(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &amp;amp;alpha, &amp;amp;A_ARR[&lt;I&gt;, m, &amp;amp;A_ARR[&lt;I&gt;, n, &amp;amp;beta, &amp;amp;C_ARR[&lt;I&gt;, m)&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Gives- AVG 7.5ms MAX 12ms MIN 7.3ms&lt;/P&gt;&lt;P&gt;mkl_cgemm_compact(CblasRowMajor, CblasConjTrans, CblasNoTrans, m, n, k, &amp;amp;alpha, &amp;amp;a_arr_compact[&lt;I&gt;, m, &amp;amp;a_arr_compact&lt;I&gt;, n, &amp;amp;beta, &amp;amp;c_arr_compact[&lt;I&gt;, m, COMPACT_FORMAT, 1)&amp;nbsp;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Gives-&amp;nbsp;AVG 225ms MAX 231ms MIN 224ms&lt;/P&gt;&lt;P&gt;Note&amp;nbsp;COMPACT_FORMAT is from mkl_get_format_compact();&lt;/P&gt;&lt;P&gt;Does any one can assist me with reducing with time it takes?&amp;nbsp;&lt;/P&gt;&lt;P&gt;It is also not clear to me why the compact API that should mostly vectorize&amp;nbsp;&amp;nbsp;matrices multiplication it getting lowest results&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Elad&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jul 2019 09:33:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179003#M29173</guid>
      <dc:creator>Yosef__Elad</dc:creator>
      <dc:date>2019-07-08T09:33:58Z</dc:date>
    </item>
    <item>
      <title>We need to check but probably</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179004#M29174</link>
      <description>&lt;P&gt;We need to check but probably compact API has not been optimized for such "big" sizes.&amp;nbsp; What version of MKL do you use?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 07:21:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179004#M29174</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-07-09T07:21:26Z</dc:date>
    </item>
    <item>
      <title>MKL version is latest 2019.4</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179005#M29175</link>
      <description>&lt;P&gt;MKL version is latest 2019.4.243&lt;/P&gt;&lt;P&gt;Another odd thing is that&amp;nbsp;cblas_cgemm show better results than&amp;nbsp;cblas_cgemm3m.&lt;/P&gt;&lt;P&gt;the latest should imporve by ~25% according to docs&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 08:47:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179005#M29175</guid>
      <dc:creator>Yosef__Elad</dc:creator>
      <dc:date>2019-07-09T08:47:43Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179006#M29176</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could&amp;nbsp;you share your benchmark to check these numbers on our side with the latest updates and CPU?&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 09:10:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179006#M29176</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-07-09T09:10:09Z</dc:date>
    </item>
    <item>
      <title>attached whole project</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179007#M29177</link>
      <description>&lt;P&gt;attached whole project&lt;/P&gt;&lt;P&gt;make AVX / AVX2 / AVX512 =y&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jul 2019 09:53:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179007#M29177</guid>
      <dc:creator>Yosef__Elad</dc:creator>
      <dc:date>2019-07-09T09:53:02Z</dc:date>
    </item>
    <item>
      <title>thanks for the project, we</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179008#M29178</link>
      <description>&lt;P&gt;thanks for the project, we will check&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 05:45:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179008#M29178</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-07-10T05:45:28Z</dc:date>
    </item>
    <item>
      <title>I tested it on AVX512 setup -</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179009#M29179</link>
      <description>&lt;P&gt;I tested it on AVX512 setup - Xeon Platinum 8176 2.10Gz&lt;/P&gt;&lt;P&gt;I can't see any improvments that comes from the AVX512.&lt;/P&gt;&lt;P&gt;Should I expect for any improvement&amp;nbsp;against AVX2 on the above setup?&lt;/P&gt;&lt;P&gt;Can't find any info in release notes.&lt;/P&gt;&lt;P&gt;Elad&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 06:47:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179009#M29179</guid>
      <dc:creator>Yosef__Elad</dc:creator>
      <dc:date>2019-07-10T06:47:06Z</dc:date>
    </item>
    <item>
      <title>Closing this thread I fond</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179010#M29180</link>
      <description>&lt;P&gt;Closing this thread I fond the issue in my timer function&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jul 2019 13:36:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cgemm3m-cgemm-compact-AND-cgemm-give-poor-results-for-small/m-p/1179010#M29180</guid>
      <dc:creator>Yosef__Elad</dc:creator>
      <dc:date>2019-07-10T13:36:35Z</dc:date>
    </item>
  </channel>
</rss>

