<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Pradeep , in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159525#M27849</link>
    <description>&lt;P&gt;Hi Pradeep ,&lt;BR /&gt;
	&lt;BR /&gt;
	​Thank you for your reply.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I had escalated the problem,&amp;nbsp;when update you if there any&amp;nbsp;updates.&lt;BR /&gt;
	&lt;BR /&gt;
	Thanks&lt;BR /&gt;
	​Ying&lt;/P&gt;</description>
    <pubDate>Wed, 25 Jul 2018 03:19:23 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2018-07-25T03:19:23Z</dc:date>
    <item>
      <title>MKL Batch GEMM with TBB threading solution gives no performance improvements</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159522#M27846</link>
      <description>&lt;P&gt;As part of the open source library ArrayFire, Intel MKL is used for GEMM operations and recently updated the code to use batch version of GEMM. We have noticed that using GNU OpenMP or Intel OpenMP as threading solution is giving the expected speedups but TBB is not. We wanted to bring it to your attention. Given below is the arrayfire benchmark code used to time the GEMM operations.&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;#include &amp;lt;arrayfire.h&amp;gt;
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;math.h&amp;gt;
#include &amp;lt;cstdlib&amp;gt;

using namespace af;

// create a small wrapper to benchmark
static array A; // populated before each timing
static void fn()
{
    array B = matmul(A, A);  // matrix multiply
    B.eval();                // ensure evaluated
}

int main(int argc, char ** argv)
{
    double peak = 0;
    try {
        int device = argc &amp;gt; 1 ? atoi(argv[1]) : 0;
        setDevice(device);
        info();

        printf("Benchmark N-by-N matrix multiply\n");
        for (int n = 128; n &amp;lt;= 2048; n += 128) {

            //printf("%4d x %4d: ", n, n);
            A = constant(1,n,n,3);
            double time = timeit(fn); // time in seconds
            double gflops = 2.0 * powf(n,3) / (time * 1e9);
            if (gflops &amp;gt; peak)
                peak = gflops;

            printf("%4.2f\n", gflops);
            fflush(stdout);
        }
    } catch (af::exception&amp;amp; e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
    }


    printf(" ### peak %g GFLOPS\n", peak);

    return 0;
}&lt;/PRE&gt;

&lt;P&gt;The benchmark results are provided in the form an interactive chart at the &lt;A href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRfKVz-VV9qGae7fLC3wYHtvNROwgfZ9yI7mCDTvxXJ2wqV8ibtrp1BVxykIz9nTVlCg5ouRvfd1hFN/pubchart?oid=1109307214&amp;amp;format=interactive"&gt;this&lt;/A&gt; URL&lt;/P&gt;

&lt;P&gt;The usage of batch GEMM call inside arrayfire can be found in the following source file.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/9prady9/arrayfire/blob/57eb26d03a738c8a99b664dcbe374bcefdb8572c/src/backend/cpu/blas.cpp" target="_blank"&gt;https://github.com/9prady9/arrayfire/blob/57eb26d03a738c8a99b664dcbe374bcefdb8572c/src/backend/cpu/blas.cpp&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Thank you,&lt;/P&gt;

&lt;P&gt;Pradeep.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Jul 2018 06:17:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159522#M27846</guid>
      <dc:creator>Garigipati__Pradeep</dc:creator>
      <dc:date>2018-07-21T06:17:33Z</dc:date>
    </item>
    <item>
      <title>Hi Pradeep, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159523#M27847</link>
      <description>&lt;P&gt;Hi Pradeep,&amp;nbsp;&lt;/P&gt;

&lt;P class="wordsection1" style="margin:0cm;margin-bottom:.0001pt"&gt;Thank you a lot to integrate MKL into ArrayFire and report the issue.&amp;nbsp;&lt;BR /&gt;
	&lt;BR /&gt;
	We will look into the problem. By the way, could you please tell&amp;nbsp; &amp;nbsp;how do you link the MKL&amp;nbsp; and tbb ,&amp;nbsp; and MKL version, compiler and your test machine&amp;nbsp; as the batched on-line article.&amp;nbsp;&lt;/P&gt;

&lt;P class="wordsection1" style="margin:0cm;margin-bottom:.0001pt"&gt;&lt;A href="https://software.intel.com/en-us/articles/introducing-batch-gemm-operations" target="_blank"&gt;https://software.intel.com/en-us/articles/introducing-batch-gemm-operations&lt;/A&gt;&lt;/P&gt;

&lt;P class="wordsection1" style="margin:0cm;margin-bottom:.0001pt"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P class="wordsection1" style="margin:0cm;margin-bottom:.0001pt"&gt;Best Regards,&lt;/P&gt;

&lt;P class="wordsection1" style="margin:0cm;margin-bottom:.0001pt"&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 01:52:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159523#M27847</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-07-24T01:52:37Z</dc:date>
    </item>
    <item>
      <title>Hi Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159524#M27848</link>
      <description>&lt;P&gt;Hi Ying,&lt;/P&gt;

&lt;P&gt;On the machine I have tested the following are the details you have asked.&lt;/P&gt;

&lt;P&gt;We dynamically link to MKL, the following are the linking flags.&lt;/P&gt;

&lt;PRE class="brush:; class-name:dark;"&gt;-L/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64
-Wl,-rpath,/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64:
-lmkl_core -ldl -lmkl_tbb_thread -lmkl_intel_lp64 -ltbb&lt;/PRE&gt;

&lt;P&gt;In the above flags, I used Intel OpenMP, hence the flag iomp5&lt;/P&gt;

&lt;P&gt;MKL Version: 2018.1.163&lt;/P&gt;

&lt;P&gt;Compiler: GCC 8.1.1&lt;/P&gt;

&lt;P&gt;Yes, I did follow that article only to write my code.&lt;/P&gt;

&lt;P&gt;Thank you for looking into it.&lt;/P&gt;

&lt;P&gt;Regards,&lt;/P&gt;

&lt;P&gt;Pradeep.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jul 2018 04:51:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159524#M27848</guid>
      <dc:creator>Garigipati__Pradeep</dc:creator>
      <dc:date>2018-07-24T04:51:00Z</dc:date>
    </item>
    <item>
      <title>Hi Pradeep ,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159525#M27849</link>
      <description>&lt;P&gt;Hi Pradeep ,&lt;BR /&gt;
	&lt;BR /&gt;
	​Thank you for your reply.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I had escalated the problem,&amp;nbsp;when update you if there any&amp;nbsp;updates.&lt;BR /&gt;
	&lt;BR /&gt;
	Thanks&lt;BR /&gt;
	​Ying&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jul 2018 03:19:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159525#M27849</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2018-07-25T03:19:23Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159526#M27850</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Strangely, the intel ngraph team has implemented batched matmul using batch gemm and unless they changed it, they are using TBB and reported good speed up :&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&lt;A href="https://github.com/NervanaSystems/ngraph/commit/dbd767994fff79d32988d8823271868d38fd3fdf" target="_blank"&gt;https://github.com/NervanaSystems/ngraph/commit/dbd767994fff79d32988d8823271868d38fd3fdf&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Kind&lt;/P&gt;

&lt;P&gt;William T.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jul 2018 04:30:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-Batch-GEMM-with-TBB-threading-solution-gives-no-performance/m-p/1159526#M27850</guid>
      <dc:creator>tambellini__william</dc:creator>
      <dc:date>2018-07-27T04:30:40Z</dc:date>
    </item>
  </channel>
</rss>

