<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Drop in performance (BLAS, MKL) in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Drop-in-performance-BLAS-MKL/m-p/1120311#M24915</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I faced problem when implemented OpenBALS and MKL. Sizes of task were 16000 - 18000, step = 64 (i.e. 16000, 16064, 16128.......18000). The task was implemented on Cluster with 24 nodes of haswell architecture (two sockets, cache = 30MB). The question is: why does performance has deep drop when size is 16384? Both of application have the same drop in performance when size is 16384.&amp;nbsp;I do not have big experience in programming and I ask about any thoughts. The miss rate also significantly increased in this size (this is why performance is decreased). Also, why does it happen in this size?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Sorry for bothering,&lt;BR /&gt;
	Thanks.&lt;/P&gt;

&lt;TABLE style="font-size: 1em; line-height: 1.5;" border="0" cellspacing="0"&gt;
	&lt;COLGROUP width="48"&gt;&lt;/COLGROUP&gt;
	&lt;COLGROUP span="2" width="102"&gt;&lt;/COLGROUP&gt;
	&lt;TBODY&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16000" align="right" height="17"&gt;Size&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="719303.024395" align="right"&gt;OpenBLAS (Speed, mflops)&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="797334.333829" align="right"&gt;MKL (speed, mflops)&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16064" align="right" height="17"&gt;&amp;nbsp;&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="716247.759907" align="right"&gt;&amp;nbsp;&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="801752.028595" align="right"&gt;&amp;nbsp;&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16256" align="right" height="17"&gt;16256&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="738278.342719" align="right"&gt;738278.342719&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="803630.559752" align="right"&gt;803630.559752&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16320" align="right" height="17"&gt;16320&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="734915.036548" align="right"&gt;734915.036548&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="805445.625905" align="right"&gt;805445.625905&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16384" align="right" height="17"&gt;16384&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="661585.465594" align="right"&gt;661585.465594&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="642552.265062" align="right"&gt;642552.265062&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16448" align="right" height="17"&gt;16448&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="719808.609165" align="right"&gt;719808.609165&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="797099.170117" align="right"&gt;797099.170117&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16512" align="right" height="17"&gt;16512&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="745339.961848" align="right"&gt;745339.961848&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="804849.076513" align="right"&gt;804849.076513&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16576" align="right" height="17"&gt;16576&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="742787.216771" align="right"&gt;742787.216771&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="803981.951285" align="right"&gt;803981.951285&lt;/TD&gt;
		&lt;/TR&gt;
	&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
    <pubDate>Thu, 07 Jul 2016 16:36:22 GMT</pubDate>
    <dc:creator>Semen_K_</dc:creator>
    <dc:date>2016-07-07T16:36:22Z</dc:date>
    <item>
      <title>Drop in performance (BLAS, MKL)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Drop-in-performance-BLAS-MKL/m-p/1120311#M24915</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;I faced problem when implemented OpenBALS and MKL. Sizes of task were 16000 - 18000, step = 64 (i.e. 16000, 16064, 16128.......18000). The task was implemented on Cluster with 24 nodes of haswell architecture (two sockets, cache = 30MB). The question is: why does performance has deep drop when size is 16384? Both of application have the same drop in performance when size is 16384.&amp;nbsp;I do not have big experience in programming and I ask about any thoughts. The miss rate also significantly increased in this size (this is why performance is decreased). Also, why does it happen in this size?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Sorry for bothering,&lt;BR /&gt;
	Thanks.&lt;/P&gt;

&lt;TABLE style="font-size: 1em; line-height: 1.5;" border="0" cellspacing="0"&gt;
	&lt;COLGROUP width="48"&gt;&lt;/COLGROUP&gt;
	&lt;COLGROUP span="2" width="102"&gt;&lt;/COLGROUP&gt;
	&lt;TBODY&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16000" align="right" height="17"&gt;Size&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="719303.024395" align="right"&gt;OpenBLAS (Speed, mflops)&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="797334.333829" align="right"&gt;MKL (speed, mflops)&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16064" align="right" height="17"&gt;&amp;nbsp;&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="716247.759907" align="right"&gt;&amp;nbsp;&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="801752.028595" align="right"&gt;&amp;nbsp;&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16256" align="right" height="17"&gt;16256&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="738278.342719" align="right"&gt;738278.342719&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="803630.559752" align="right"&gt;803630.559752&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16320" align="right" height="17"&gt;16320&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="734915.036548" align="right"&gt;734915.036548&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="805445.625905" align="right"&gt;805445.625905&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16384" align="right" height="17"&gt;16384&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="661585.465594" align="right"&gt;661585.465594&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="642552.265062" align="right"&gt;642552.265062&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16448" align="right" height="17"&gt;16448&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="719808.609165" align="right"&gt;719808.609165&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="797099.170117" align="right"&gt;797099.170117&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16512" align="right" height="17"&gt;16512&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="745339.961848" align="right"&gt;745339.961848&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="804849.076513" align="right"&gt;804849.076513&lt;/TD&gt;
		&lt;/TR&gt;
		&lt;TR&gt;
			&lt;TD sdnum="6153;" sdval="16576" align="right" height="17"&gt;16576&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="742787.216771" align="right"&gt;742787.216771&lt;/TD&gt;
			&lt;TD sdnum="6153;" sdval="803981.951285" align="right"&gt;803981.951285&lt;/TD&gt;
		&lt;/TR&gt;
	&lt;/TBODY&gt;
&lt;/TABLE&gt;</description>
      <pubDate>Thu, 07 Jul 2016 16:36:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Drop-in-performance-BLAS-MKL/m-p/1120311#M24915</guid>
      <dc:creator>Semen_K_</dc:creator>
      <dc:date>2016-07-07T16:36:22Z</dc:date>
    </item>
  </channel>
</rss>

