<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic cblas_sgemm speed is abnormal in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179091#M29187</link>
    <description>&lt;PRE class="brush:cpp;"&gt;	int a = 169*64;
	int b = 64*1024;
        const int c = 5;
	float* A = new float[169*64];
	float* B = new float[64*1024];
	float* C = new float[169*1024];
	srand(time(NULL));
	for (int i=0;i&amp;lt;a;i++)
	{
		A&lt;I&gt; = rand()%1000/100.0;

		if (i%c==0)
		{
			A&lt;I&gt; = -4.204e-045;
		}
	}
	for (int j=0;j&amp;lt;b;j++)
	{
		B&lt;J&gt; = rand()%10000/1000.0;
	}
	while (true)
	{
		double t0 = cvGetTickCount();
		cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 169, 1024, 64, 1.0, A, 64, B, 1024, .0, C, 1024);
		double t1 = cvGetTickCount()-t0;
		cout&amp;lt;&amp;lt;"consume time:"&amp;lt;&amp;lt;t1/cvGetTickFrequency()/1000.0&amp;lt;&amp;lt;endl;
	}&lt;/J&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;excute code above, change constant c, &amp;nbsp;the consume time is different. I guess the running time will be slower when the metrix contains denorimalized value. why?&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jan 2018 02:00:16 GMT</pubDate>
    <dc:creator>Hou_y_1</dc:creator>
    <dc:date>2018-01-03T02:00:16Z</dc:date>
    <item>
      <title>cblas_sgemm speed is abnormal</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179091#M29187</link>
      <description>&lt;PRE class="brush:cpp;"&gt;	int a = 169*64;
	int b = 64*1024;
        const int c = 5;
	float* A = new float[169*64];
	float* B = new float[64*1024];
	float* C = new float[169*1024];
	srand(time(NULL));
	for (int i=0;i&amp;lt;a;i++)
	{
		A&lt;I&gt; = rand()%1000/100.0;

		if (i%c==0)
		{
			A&lt;I&gt; = -4.204e-045;
		}
	}
	for (int j=0;j&amp;lt;b;j++)
	{
		B&lt;J&gt; = rand()%10000/1000.0;
	}
	while (true)
	{
		double t0 = cvGetTickCount();
		cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 169, 1024, 64, 1.0, A, 64, B, 1024, .0, C, 1024);
		double t1 = cvGetTickCount()-t0;
		cout&amp;lt;&amp;lt;"consume time:"&amp;lt;&amp;lt;t1/cvGetTickFrequency()/1000.0&amp;lt;&amp;lt;endl;
	}&lt;/J&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;excute code above, change constant c, &amp;nbsp;the consume time is different. I guess the running time will be slower when the metrix contains denorimalized value. why?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jan 2018 02:00:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179091#M29187</guid>
      <dc:creator>Hou_y_1</dc:creator>
      <dc:date>2018-01-03T02:00:16Z</dc:date>
    </item>
    <item>
      <title>Floating-point operations on</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179092#M29188</link>
      <description>&lt;P&gt;&lt;SPAN style="color: rgb(102, 102, 102); font-family: Arial, Tahoma, Helvetica, sans-serif; font-size: 13px;"&gt;Floating-point operations on denormals are slower than on normalized operands because denormal operands and results are usually handled through a software assist mechanism rather than directly in hardware. This software processing causes Intel MKL functions that consume denormals to run slower than with normalized floating-point numbers.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2018 04:27:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179092#M29188</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-01-04T04:27:44Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179093#M29189</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;Denormal number calculation will be slow. You may can use Intel C/C++ compiler with /Qftz option flush to zero, and the perf of MKL sgemm would be improved. Or you can modify your source code to process all denormal to a normal number, such as&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;numeric_limits&amp;lt;float&amp;gt;::min().&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Best regards,&lt;BR /&gt;
	Fiona&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jan 2018 03:04:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-speed-is-abnormal/m-p/1179093#M29189</guid>
      <dc:creator>Zhen_Z_Intel</dc:creator>
      <dc:date>2018-01-05T03:04:00Z</dc:date>
    </item>
  </channel>
</rss>

